Skip to main content

Showing 1–50 of 1,787 results for author: Sun, M

.
  1. arXiv:2507.07456  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    General purpose models for the chemical sciences

    Authors: Nawaf Alampara, Anagha Aneesh, Martiño Ríos-García, Adrian Mirza, Mara Schilling-Wilhelmi, Ali Asghar Aghajani, Meiling Sun, Gordan Prastalo, Kevin Maik Jablonka

    Abstract: Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tas… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  2. arXiv:2507.07016  [pdf, ps, other

    cs.LG eess.SP

    On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

    Authors: Jian Huang, Yongli Zhu, Linna Xu, Zhe Zheng, Wenpeng Cui, Mingyang Sun

    Abstract: In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are invest… ▽ More

    Submitted 9 July, 2025; originally announced July 2025.

    Comments: This paper is currently under reviewing by an IEEE publication; it may be subjected to minor changes due to review comments later

  3. arXiv:2507.05687  [pdf, ps, other

    cs.LG cs.CL

    AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

    Authors: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and mem… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  4. arXiv:2507.05685  [pdf, ps, other

    cs.LG cs.AI

    Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach

    Authors: Xiaobing Chen, Boyang Zhang, Xiangwei Zhou, Mingxuan Sun, Shuai Zhang, Songyang Zhang, Geoffrey Ye Li

    Abstract: The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay betwee… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 7 pages

  5. arXiv:2507.05609  [pdf, ps, other

    eess.AS

    MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses

    Authors: Yang Liu, Li Wan, Yiteng Huang, Yong Xu, yangyang shi, Saurabh Adya, ming sun, Florian Metze

    Abstract: Smart glasses are increasingly positioned as the next-generation interface for ubiquitous access to large language models (LLMs). Nevertheless, achieving reliable interaction in real-world noisy environments remains a major challenge, particularly due to interference from side speech. In this work, we introduce a novel side-talk rejection multi-microphone Whisper (MMW) framework for smart glasses,… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  6. arXiv:2507.04359  [pdf, ps, other

    astro-ph.GA

    Interband Lag Variability in Active Galactic Nuclei across ZTF Data from Multiple Years

    Authors: Zhen-Bo Su, Zhen-Yi Cai, Hengxiao Guo, Mouyuan Sun, Jun-Xian Wang

    Abstract: Interband lags in the optical continua of active galactic nuclei (AGN) have been observed over years of monitoring, yet their physical origins remain unclear. While variable interband lags have been found in a few individual AGN potentially, the temporal behavior of interband lags of an AGN sample has not been explored systematically. Here, we analyze the interband lags of 94 bright AGN at… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Accepted by ApJ, comments are welcome!

  7. arXiv:2507.03280  [pdf, ps, other

    cs.IR

    Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation

    Authors: Dong Zhang, Lin Li, Ming Li, Xiaohui Tao, Meng Sun, Jimmy Xiangji Huang

    Abstract: Existing solutions for bundle recommendation(BR) have achieved remarkable effectiveness for predicting the user's preference for prebuilt bundles. However, bundle-item(B-I) affiliation will vary dynamically in real scenarios. For example, a bundle themed as 'casual outfit', may add 'hat' or remove 'watch' due to factors such as seasonal variations, changes in user pes or inventory adjustments. Our… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  8. arXiv:2507.02527  [pdf, ps, other

    astro-ph.GA

    A Virgo Environmental Survey Tracing Ionised Gas Emission (VESTIGE). XIX. The discovery of a spectacular 230 kpc Halpha tail following NGC 4569 in the Virgo cluster

    Authors: M. Sun, H. Le, B. Epinat, A. Boselli, R. Luo, K. Hosogi, N. Pichette, W. Forman, C. Sarazin, M. Fossati, H. Chen, E. Sarpa, J. Braine, J. C. Cuillandre, S. Gwyn, G. Hensler, S. Martocchia, B. Vollmer

    Abstract: Context. Galaxies fly inside galaxy clusters and ram pressure by the ICM can remove a large amount of the ISM from the galaxy, and deposit the gas in the ICM. The ISM decoupled from the host galaxy leaves a long trail following the moving galaxy. Such long trails track the galaxy motion and can be detected with sensitive data in Halpha. Aims. We study the Halpha tail trailing NGC 4569 in the Vir… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: 6 pages, 3 figures, 1 table, submitted to A&A

  9. arXiv:2507.01564  [pdf, ps, other

    eess.IV cs.CV

    Multi Source COVID-19 Detection via Kernel-Density-based Slice Sampling

    Authors: Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

    Abstract: We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to se… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  10. arXiv:2507.01485  [pdf, ps, other

    cs.RO cs.AI cs.MA q-bio.QM

    BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

    Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), a… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  11. arXiv:2506.23138  [pdf, ps, other

    cs.CV

    VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

    Authors: Shiyu Wu, Mingzhen Sun, Weining Wang, Yequan Wang, Jing Liu

    Abstract: Since there exists a notable gap between user-provided and model-preferred prompts, generating high-quality and satisfactory images using diffusion models often requires prompt engineering to optimize user inputs. Current studies on text-to-image prompt engineering can effectively enhance the style and aesthetics of generated images. However, they often neglect the semantic alignment between gener… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 12 pages, 5 figures

  12. arXiv:2506.21873  [pdf, ps, other

    cs.CV cs.AI

    Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning

    Authors: Tzu-Chun Chien, Chieh-Kai Lin, Shiang-Feng Tsai, Ruei-Chi Lai, Hung-Jen Chen, Min Sun

    Abstract: Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual grounding, establishing themselves as a general interface for various vision-language applications. This progress has driven the development of token pruning methods to mitigate the high computational costs associated with processing numerous visual tokens. However, we observe that pruning significantly… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  13. arXiv:2506.21011  [pdf, ps, other

    cs.CV

    Bridging Video Quality Scoring and Justification via Large Multimodal Models

    Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Jiachao Gong, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

    Abstract: Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 15 pages, 4 figures, 8 tables

  14. arXiv:2506.19140  [pdf, ps, other

    cs.LG

    Command-V: Pasting LLM Behaviors via Activation Profiles

    Authors: Barry Wang, Avi Schwarzschild, Alexander Robey, Ali Payani, Charles Fleming, Mingjie Sun, Daphne Ippolito

    Abstract: Retrofitting large language models (LLMs) with new behaviors typically requires full finetuning or distillation-costly steps that must be repeated for every architecture. In this work, we introduce Command-V, a backpropagation-free behavior transfer method that copies an existing residual activation adapter from a donor model and pastes its effect into a recipient model. Command-V profiles layer a… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  15. arXiv:2506.18254  [pdf, ps, other

    cs.LG cs.AI cs.CL

    RLPR: Extrapolating RLVR to General Domains without Verifiers

    Authors: Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key o… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Website: https://github.com/openbmb/RLPR

  16. arXiv:2506.18237  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

    Authors: Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

    Abstract: Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mech… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  17. arXiv:2506.17973  [pdf, ps, other

    nucl-th nucl-ex

    Investigation of the neutron-proton effective mass splitting via heavy ion collisions: Constraints and Implications

    Authors: Junping Yang, Meiqi Sun, Ying Cui, Yangyang Liu, Zhuxia Li, Kai Zhao, Yingxun Zhang

    Abstract: The neutron-proton effective mass splitting ($Δm^*_{np}$) is investigated through analyses of heavy-ion collisions using the improved quantum molecular dynamics (ImQMD) model with both standard and extended Skyrme interactions. By uncovering the strong correlation between the slope of the neutron-to-proton yield ratio with respect to the kinetic energy (i.e., $S_{n/p} $) and $Δm^*_{np}$, we reveal… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 7 pages, 5 figures

  18. arXiv:2506.17728  [pdf, ps, other

    cs.CL cs.AI

    KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation

    Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo, Haofen Wang, Huajun Chen

    Abstract: In this paper, we introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM). Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process in question-answering (Q&A) tasks on… ▽ More

    Submitted 30 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

  19. arXiv:2506.17081  [pdf, ps, other

    cond-mat.quant-gas

    Quantum droplets in rapidly rotating two-dimensional Bose-Einstein condensates

    Authors: Zhen Cao, Siying Li, Zhendong Li, Xinyi Liu, Zhigang Wu, Mingyuan Sun

    Abstract: Recent experiments demonstrate that rapidly rotating Bose-Einstein condensates (BECs) near the lowest Landau level can self-organize into interaction-driven persistent droplet arrays. Inspired by this discovery, we investigate the formation and dynamics of single droplet and droplet arrays in rapidly rotating BECs. Guided by a rigorous theorem on localized many-body states for 2D interacting syste… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures

  20. arXiv:2506.16807  [pdf

    physics.chem-ph

    Electrochemistry-Enhanced Dynamic Paths Sampling Unveiling Nuclear Quantum Effects in Electrocatalysis

    Authors: Li Fu, Yifan Li, Menglin Sun, Xiaolong Yang, Bin Jin, Shenzhen Xu

    Abstract: Proton-coupled electron transfers (PCET) are elementary steps in electrocatalysis. However, accurate calculations of PCET rates remain challenging, especially considering nuclear quantum effects (NQEs) under a constant potential condition. Statistical sampling of reaction paths is an ideal approach for rate calculations, however, is always limited by the rare-event issue. Here we develop an electr… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  21. arXiv:2506.14973  [pdf, ps, other

    eess.AS cs.AI

    Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

    Authors: Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze

    Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  22. arXiv:2506.13907  [pdf, ps, other

    astro-ph.GA astro-ph.CO astro-ph.HE

    Extreme AGN feedback in the fossil galaxy group SDSSTG 4436

    Authors: D. Eckert, F. Gastaldello, L. Lovisari, S. McGee, T. Pasini, M. Brienza, K. Kolokythas, E. O'Sullivan, A. Simionescu, M. Sun, M. Ayromlou, M. A. Bourne, Y. Chen, W. Cui, S. Ettori, A. Finoguenov, G. Gozaliasl, R. Kale, F. Mernier, B. D. Oppenheimer, G. Schellenberger, R. Seppi, E. Tempel

    Abstract: Supermassive black hole feedback is the currently favoured mechanism to regulate the star formation rate of galaxies and prevent the formation of ultra-massive galaxies ($M_\star>10^{12}M_\odot$). However, the mechanism through which the outflowing energy is transferred to the surrounding medium strongly varies from one galaxy evolution model to another, such that a unified model for AGN feedback… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 15 pages, 10 figures, re-submitted to A&A after minor revision

  23. arXiv:2506.13841  [pdf, ps, other

    cs.AI

    LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

    Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi

    Abstract: Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation -- leaving open the question of whether such reasonin… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  24. arXiv:2506.12411  [pdf, ps, other

    cs.CR cs.CV

    InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning

    Authors: Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge

    Abstract: Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities, yet their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they r… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  25. arXiv:2506.09542  [pdf, ps, other

    cs.CL

    KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs

    Authors: Dingjun Wu, Yukun Yan, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing methods typically rely on a single source, either unstructured text or structured knowledge. Moreover, they lack cognitively inspired mechanisms for activating relevant knowledge. To address these issues, we propose KG-Infused RAG, a framework that integrates KGs into RAG… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  26. arXiv:2506.07996  [pdf, ps, other

    cs.CV cs.RO

    UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References

    Authors: Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo

    Abstract: 6D object pose estimation has shown strong generalizability to novel objects. However, existing methods often require either a complete, well-reconstructed 3D model or numerous reference images that fully cover the object. Estimating 6D poses from partial references, which capture only fragments of an object's appearance and geometry, remains challenging. To address this, we propose UA-Pose, an un… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: CVPR 2025

  27. arXiv:2506.07955  [pdf, ps, other

    cs.HC

    Implementation Considerations for Automated AI Grading of Student Work

    Authors: Zewei, Tian, Alex Liu, Lief Esbenshade, Shawon Sarkar, Zachary Zhang, Kevin He, Min Sun

    Abstract: This study explores the classroom implementation of an AI-powered grading platform in K-12 settings through a co-design pilot with 19 teachers. We combine platform usage logs, surveys, and qualitative interviews to examine how teachers use AI-generated rubrics and grading feedback. Findings reveal that while teachers valued the AI's rapid narrative feedback for formative purposes, they distrusted… ▽ More

    Submitted 17 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  28. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  29. arXiv:2506.07657  [pdf, ps, other

    cs.GR cs.CV

    PIG: Physically-based Multi-Material Interaction with 3D Gaussians

    Authors: Zeyu Xiao, Zhenyi Wu, Mingyang Sun, Qipeng Yan, Yufan Guo, Zhuoer Liang, Lihua Zhang

    Abstract: 3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Inter… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  30. ALMA-JELLY I: High Resolution CO(2-1) Observations of Ongoing Ram Pressure Stripping in NGC 4858 Reveal Asymmetrical Gas Tail Formation and Fallback

    Authors: Harrison J. Souchereau, Jeffrey D. P. Kenney, Pavel Jachym, Ming Sun, William J. Cramer, Masafumi Yagi, Alessandro Boselli, Elias Brinks, Francoise Combes, Luca Cortese, Boris Deshev, Matteo Fossati, Romana Grossova, Rongxin Luo, Jan Palous, Tom C. Scott

    Abstract: We present new CO(2-1) observations (resolution $\sim1" = 460$pc) of the Coma cluster jellyfish galaxy NGC 4858 obtained from the ALMA-JELLY large program. Analyzing this data alongside complimentary Subaru H$α$ and HST (F600LP / F350LP) observations, we find numerous structural and kinematic features indicative of the effects from strong, inclined ram pressure, including an asymmetric inner gas t… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  31. arXiv:2506.04909  [pdf, ps, other

    cs.AI cs.CL cs.CR cs.LG

    When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

    Authors: Kai Wang, Yihao Zhang, Meng Sun

    Abstract: The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models' explicit thought paths enable us to study strategic deception--goal-driven, intentional misinfor… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  32. arXiv:2506.04757  [pdf, ps, other

    astro-ph.CO astro-ph.GA astro-ph.HE

    Modelling the selection of galaxy groups with end to end simulations

    Authors: R. Seppi, D. Eckert, A. Finoguenov, S . Shreeram, E. Tempel, G. Gozaliasl, M. Lorenz, J. Wilms, G. A. Mamon, F. Gastaldello, L. Lovisari, E. O'Sullivan, K. Kolokythas, M. A. Bourne, M. Sun, A. Pillepich

    Abstract: Feedback from supernovae and AGN shapes galaxy formation and evolution, yet its impact remains unclear. Galaxy groups offer a crucial probe, as their binding energy is comparable to that available from their central AGN. The XMM-Newton Group AGN Project (X-GAP) is a sample of 49 groups selected in X-ray (ROSAT) and optical (SDSS) bands, providing a benchmark for hydrodynamical simulations. In sigh… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted for publication on A&A

    Journal ref: A&A 699, A206 (2025)

  33. arXiv:2506.04329  [pdf, ps, other

    astro-ph.CO astro-ph.GA astro-ph.HE

    Estimating Bolometric Luminosities of Type 1 Quasars with Self-Organizing Maps

    Authors: Jie Chen, Linhua Jiang, Shengxiu Sun, Zijian Zhang, Mouyuan Sun

    Abstract: We present a new method to calculate bolometric luminosities for unobscured, type 1 quasars with multi-band photometric data. Bolometric luminosity is a fundamental property to understand quasars and it is commonly estimated from monochromatic luminosities using bolometric corrections that often neglect quasar SED diversity. We take advantage of the fact that most quasars now have multi-band obser… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 18 pages, 13 figures. Resubmitted to ApJ based on reviewer report. Code QSOLbol is available at this https URL https://github.com/ChenJiemi/QSOLbol

  34. arXiv:2506.03770  [pdf, ps, other

    eess.SP

    Multiuser Beamforming for Pinching-Antenna Systems: An Element-wise Optimization Framework

    Authors: Mingjun Sun, Chongjun Ouyang, Shaochuan Wu, Yuanwei Liu

    Abstract: The pinching-antenna system (PASS) reconstructs wireless channels through pinching beamforming, i.e., optimizing the activated locations of pinching antennas (PAs) along the waveguide. The aim of this article is to investigate the joint design of baseband beamforming and pinching beamforming. A low-complexity element-wise sequential optimization framework is proposed to address the sum-rate maximi… ▽ More

    Submitted 7 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  35. arXiv:2506.02522  [pdf, ps, other

    cs.AI

    Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

    Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun

    Abstract: Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propo… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  36. arXiv:2506.02503  [pdf, ps, other

    cs.CL

    KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

    Authors: Yongjian Li, HaoCheng Chu, Yukun Yan, Zhenghao Liu, Shi Yu, Zheni Zeng, Ruobing Wang, Sen Song, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  37. arXiv:2506.01947  [pdf, ps, other

    eess.IV cs.CV

    RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Radu Berdan, Beril Besbinar, Daisuke Iso, Pengzhou Ji, Xiong Dun, Zeying Fan, Chen Wu, Zhansheng Wang, Pengbo Zhang, Jiazi Huang, Qinglin Liu, Wei Yu, Shengping Zhang, Xiangyang Ji, Kyungsik Kim, Minkyung Kim, Hwalmin Lee, Hekun Ma, Huan Zheng, Yanyan Wei, Zhao Zhang, Jing Fang, Meilin Gao , et al. (8 additional authors not shown)

    Abstract: Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  38. arXiv:2506.01770  [pdf, ps, other

    cs.CR cs.AI cs.LG cs.SE

    ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs

    Authors: Zeming Wei, Chengcan Wu, Meng Sun

    Abstract: Large Language Models (LLMs) have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models, model-based analysis has demonstrated notable potential in stateful deep neural networks, yet suffers from s… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  39. arXiv:2506.01391  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

    Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun

    Abstract: The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, whi… ▽ More

    Submitted 16 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Updated results in Table 2 and Table 3; The project is available at https://github.com/OpenBMB/AgentCPM-GUI

    ACM Class: I.2.8; I.2.7; I.2.10; H.5.2

  40. arXiv:2505.24550  [pdf, ps, other

    cs.CL

    A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

    Authors: Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He

    Abstract: Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Though… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  41. arXiv:2505.24388  [pdf, other

    cs.CL

    ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

    Authors: Hao Chen, Yukun Yan, Sen Mei, Wanxiang Che, Zhenghao Liu, Qi Shi, Xinze Li, Yuchun Fan, Pengcheng Huang, Qiushi Xiong, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To add… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  42. arXiv:2505.23187  [pdf, ps, other

    cs.CL cs.AI cs.MA

    Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration

    Authors: Yilong Li, Chen Qian, Yu Xia, Ruijie Shi, Yufan Dang, Zihao Xie, Ziming You, Weize Chen, Cheng Yang, Weichuan Liu, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  43. arXiv:2505.23151  [pdf, ps, other

    astro-ph.SR astro-ph.HE

    A Be star-black hole binary with a wide orbit from LAMOST time-domain survey

    Authors: Qian-Yu An, Yang Huang, Wei-Min Gu, Yong Shao, Zhi-Xiang Zhang, Tuan Yi, B. D. Lailey, T. A. A. Sigut, Kyle Akira Rocha, Meng Sun, Seth Gossage, Shi-Jie Gao, Shan-Shan Weng, Song Wang, Bowen Zhang, Xinlin Zhao, Senyu Qi, Shilong Liao, Jianghui Ji, Junfeng Wang, Jianfeng Wu, Mouyuan Sun, Xiang-Dong Li, Jifeng Liu

    Abstract: Binary systems consisting of an early type star and a black hole (BH) are crucial for understanding various astrophysical phenomena, particularly the origins of detected gravitational wave sources. Be binary systems are expected to represent a key evolutionary stage in hosting BHs. However, while hundreds of Be X-ray binaries are known, the only confirmed BH candidate in a Be binary remains highly… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 76 pages, 29 figures, to be submitted

  44. arXiv:2505.23079  [pdf, ps, other

    cs.HC

    iTrace : Interactive Tracing of Cross-View Data Relationships

    Authors: Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, Jian Zhao, David Koop

    Abstract: Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing and linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scatt… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 13 pages, 14 figures, accepted to Graphics Interface 2025

    MSC Class: 68U05 ACM Class: H.5.2; I.3.6; I.3.8

  45. arXiv:2505.22949  [pdf, ps, other

    cs.LG

    Directed Graph Grammars for Sequence-based Learning

    Authors: Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

    Abstract: Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approac… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  46. arXiv:2505.22948  [pdf, ps, other

    cs.AI

    Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

    Authors: Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

    Abstract: Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  47. arXiv:2505.22787  [pdf, ps, other

    cs.CL

    Can Large Language Models Match the Conclusions of Systematic Reviews?

    Authors: Christopher Polzak, Alejandro Lozano, Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy

    Abstract: Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  48. arXiv:2505.22445  [pdf, other

    cs.CV cs.AI

    NFR: Neural Feature-Guided Non-Rigid Shape Registration

    Authors: Puhua Jiang, Zhangquan Chen, Mingze Sun, Ruqi Huang

    Abstract: In this paper, we propose a novel learning-based framework for 3D shape registration, which overcomes the challenges of significant non-rigid deformation and partiality undergoing among input shapes, and, remarkably, requires no correspondence annotation during training. Our key insight is to incorporate neural features learned by deep learning-based shape matching networks into an iterative, geom… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 20 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2311.04494

    ACM Class: I.4.m; I.2.6

  49. arXiv:2505.22131  [pdf, other

    cs.CL

    EULER: Enhancing the Reasoning Ability of Large Language Models through Error-Induced Learning

    Authors: Zhuoyang Wu, Xinze Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Minghe Yu, Cheng Yang, Yu Gu, Ge Yu, Maosong Sun

    Abstract: Large Language Models (LLMs) have demonstrated strong reasoning capabilities and achieved promising results in mathematical problem-solving tasks. Learning from errors offers the potential to further enhance the performance of LLMs during Supervised Fine-Tuning (SFT). However, the errors in synthesized solutions are typically gathered from sampling trails, making it challenging to generate solutio… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  50. arXiv:2505.22095  [pdf, ps, other

    cs.CL

    Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

    Authors: Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu, Minghe Yu, Ge Yu, Maosong Sun

    Abstract: Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook th… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.