Search | arXiv e-print repository

General purpose models for the chemical sciences

Authors: Nawaf Alampara, Anagha Aneesh, Martiño Ríos-García, Adrian Mirza, Mara Schilling-Wilhelmi, Ali Asghar Aghajani, Meiling Sun, Gordan Prastalo, Kevin Maik Jablonka

Abstract: Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tas… ▽ More Data-driven techniques have a large potential to transform and accelerate the chemical sciences. However, chemical sciences also pose the unique challenge of very diverse, small, fuzzy datasets that are difficult to leverage in conventional machine learning approaches completely. A new class of models, general-purpose models (GPMs) such as large language models, have shown the ability to solve tasks they have not been directly trained on, and to flexibly operate with low amounts of data in different formats. In this review, we discuss fundamental building principles of GPMs and review recent applications of those models in the chemical sciences across the entire scientific process. While many of these applications are still in the prototype phase, we expect that the increasing interest in GPMs will make many of them mature in the coming years. △ Less

Submitted 10 July, 2025; originally announced July 2025.

arXiv:2507.07016 [pdf, ps, other]

On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence

Authors: Jian Huang, Yongli Zhu, Linna Xu, Zhe Zheng, Wenpeng Cui, Mingyang Sun

Abstract: In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are invest… ▽ More In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are investigated: a gradient boosting tree model and a recurrent neural network model. To adapt to the resource-limited situation in the smart meter, "mixed"- and "reduced"-precision training schemes are also devised. Experiment results demonstrate the feasibility of economically achieving grid-edge intelligence via the existing advanced metering infrastructures. △ Less

Submitted 9 July, 2025; originally announced July 2025.

Comments: This paper is currently under reviewing by an IEEE publication; it may be subjected to minor changes due to review comments later

arXiv:2507.05687 [pdf, ps, other]

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Authors: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

Abstract: Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and mem… ▽ More Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton. △ Less

Submitted 8 July, 2025; originally announced July 2025.

arXiv:2507.05685 [pdf, ps, other]

Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach

Authors: Xiaobing Chen, Boyang Zhang, Xiangwei Zhou, Mingxuan Sun, Shuai Zhang, Songyang Zhang, Geoffrey Ye Li

Abstract: The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay betwee… ▽ More The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay between heterogeneous client resources and the sophisticated coordination required for numerous specialized experts. This article highlights a critical, yet underexplored concept: the absence of robust quantitative strategies for dynamic client-expert alignment that holistically considers varying client capacities and the imperative for system-wise load balancing. Specifically, we propose a conceptual system design for intelligent client-expert alignment that incorporates dynamic fitness scoring, global expert load monitoring, and client capacity profiling. By tackling these systemic issues, we can unlock more scalable, efficient, and robust training mechanisms {with fewer communication rounds for convergence}, paving the way for the widespread deployment of large-scale federated MoE-structured LAMs in edge computing with ultra-high communication efficiency. △ Less

Submitted 8 July, 2025; originally announced July 2025.

Comments: 7 pages

arXiv:2507.05609 [pdf, ps, other]

MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses

Authors: Yang Liu, Li Wan, Yiteng Huang, Yong Xu, yangyang shi, Saurabh Adya, ming sun, Florian Metze

Abstract: Smart glasses are increasingly positioned as the next-generation interface for ubiquitous access to large language models (LLMs). Nevertheless, achieving reliable interaction in real-world noisy environments remains a major challenge, particularly due to interference from side speech. In this work, we introduce a novel side-talk rejection multi-microphone Whisper (MMW) framework for smart glasses,… ▽ More Smart glasses are increasingly positioned as the next-generation interface for ubiquitous access to large language models (LLMs). Nevertheless, achieving reliable interaction in real-world noisy environments remains a major challenge, particularly due to interference from side speech. In this work, we introduce a novel side-talk rejection multi-microphone Whisper (MMW) framework for smart glasses, incorporating three key innovations. First, we propose a Mix Block based on a Tri-Mamba architecture to effectively fuse multi-channel audio at the raw waveform level, while maintaining compatibility with streaming processing. Second, we design a Frame Diarization Mamba Layer to enhance frame-level side-talk suppression, facilitating more efficient fine-tuning of Whisper models. Third, we employ a Multi-Scale Group Relative Policy Optimization (GRPO) strategy to jointly optimize frame-level and utterance-level side speech suppression. Experimental evaluations demonstrate that the proposed MMW system can reduce the word error rate (WER) by 4.95\% in noisy conditions. △ Less

Submitted 7 July, 2025; originally announced July 2025.

arXiv:2507.04359 [pdf, ps, other]

Interband Lag Variability in Active Galactic Nuclei across ZTF Data from Multiple Years

Authors: Zhen-Bo Su, Zhen-Yi Cai, Hengxiao Guo, Mouyuan Sun, Jun-Xian Wang

Abstract: Interband lags in the optical continua of active galactic nuclei (AGN) have been observed over years of monitoring, yet their physical origins remain unclear. While variable interband lags have been found in a few individual AGN potentially, the temporal behavior of interband lags of an AGN sample has not been explored systematically. Here, we analyze the interband lags of 94 bright AGN at… ▽ More Interband lags in the optical continua of active galactic nuclei (AGN) have been observed over years of monitoring, yet their physical origins remain unclear. While variable interband lags have been found in a few individual AGN potentially, the temporal behavior of interband lags of an AGN sample has not been explored systematically. Here, we analyze the interband lags of 94 bright AGN at $z<0.8$, using both seasonal one-year and full six-year $gri$-band light curves from Zwicky Transient Facility Data Release 22. We find that more than half of 94 AGN show significant seasonal variations in the interband lags. Besides, the short-term lags, derived by averaging lags inferred from multiple seasonal light curves, are consistently smaller than the long-term lags, which are inferred from the full six-year light curves. This supports recent theoretical simulations where the lag measurement is sensitive to the baseline of light curve and the lag variation could be simply attributed to the inherent randomness of AGN variability. Our findings suggest that the interband lags of AGN are more complex and stochastic than commonly thought, and highlight the importance of high-precision time-domain surveys in uncovering the properties of AGN variability as well as the associated accretion physics. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: Accepted by ApJ, comments are welcome!

arXiv:2507.03280 [pdf, ps, other]

Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation

Authors: Dong Zhang, Lin Li, Ming Li, Xiaohui Tao, Meng Sun, Jimmy Xiangji Huang

Abstract: Existing solutions for bundle recommendation(BR) have achieved remarkable effectiveness for predicting the user's preference for prebuilt bundles. However, bundle-item(B-I) affiliation will vary dynamically in real scenarios. For example, a bundle themed as 'casual outfit', may add 'hat' or remove 'watch' due to factors such as seasonal variations, changes in user pes or inventory adjustments. Our… ▽ More Existing solutions for bundle recommendation(BR) have achieved remarkable effectiveness for predicting the user's preference for prebuilt bundles. However, bundle-item(B-I) affiliation will vary dynamically in real scenarios. For example, a bundle themed as 'casual outfit', may add 'hat' or remove 'watch' due to factors such as seasonal variations, changes in user pes or inventory adjustments. Our empirical study demonstrates that the performance of mainstream BR models will fluctuate or even decline regarding item-level variability. This paper makes the first attempt to referencaddress the above problem and proposes a novel Residual Diffusion for Bundle Recommendation(RDiffBR) as a model-agnostic generative framework which can assist a BR model in adapting this scenario. During the initial training of the BR model, RDiffBR employs a residual diffusion model to process the item-level bundle embeddings which are generated by BR model to represent bundle theme via a forward-reverse process. In the inference stage, RDiffBR reverses item-level bundle embeddings obtained by the well-trained bundle model under B-I variability scenarios to generate the effective item-level bundle embeddings. In particular, the residual connection in our residual approximator significantly enhances item-level bundle embeddings generation ability of BR models. Experiments on six BR models and four public datasets from different domains show that RDiffBR improves the performance of Recall and NDCG of backbone BR models by up to 23%, while only increased training time about 4%.Codes and datasets are available at https://anonymous.4open.science/r/RDiffBR. △ Less

Submitted 3 July, 2025; originally announced July 2025.

arXiv:2507.02527 [pdf, ps, other]

A Virgo Environmental Survey Tracing Ionised Gas Emission (VESTIGE). XIX. The discovery of a spectacular 230 kpc Halpha tail following NGC 4569 in the Virgo cluster

Authors: M. Sun, H. Le, B. Epinat, A. Boselli, R. Luo, K. Hosogi, N. Pichette, W. Forman, C. Sarazin, M. Fossati, H. Chen, E. Sarpa, J. Braine, J. C. Cuillandre, S. Gwyn, G. Hensler, S. Martocchia, B. Vollmer

Abstract: Context. Galaxies fly inside galaxy clusters and ram pressure by the ICM can remove a large amount of the ISM from the galaxy, and deposit the gas in the ICM. The ISM decoupled from the host galaxy leaves a long trail following the moving galaxy. Such long trails track the galaxy motion and can be detected with sensitive data in Halpha. Aims. We study the Halpha tail trailing NGC 4569 in the Vir… ▽ More Context. Galaxies fly inside galaxy clusters and ram pressure by the ICM can remove a large amount of the ISM from the galaxy, and deposit the gas in the ICM. The ISM decoupled from the host galaxy leaves a long trail following the moving galaxy. Such long trails track the galaxy motion and can be detected with sensitive data in Halpha. Aims. We study the Halpha tail trailing NGC 4569 in the Virgo cluster. Methods. The initial discovery was made with the deep Halpha imaging data with CFHT, from the VESTIGE project. The follow-up spectroscopic observations were made with APO/DIS, MMT/Binospec and CFHT/SITELLE. Results. Besides the known 80 kpc Halpha tail downstream of NGC 4569, the deep Halpha imaging data allow the Halpha tail detected to at least 230 kpc from the galaxy. More importantly, the Halpha clumps implied from the imaging data are confirmed with the spectroscopic data. The Halpha clumps show a smooth radial velocity gradient across about 1300 km/s, eventually reaching the velocity of the cluster. We build a simple model to explain the deceleration of stripped clumps and constrain the age to about 0.9 Gyr. Conclusions. This discovery, for the first time, demonstrates the full deceleration process of the stripped ISM. This discovery also showcases the potential with wide-field Halpha survey on galaxy clusters to discover intracluster optical emission-line clouds originated from cluster galaxies. These clouds provide kinematic tracers to the infall history of cluster galaxies and the turbulence in the ICM. They are also excellent multi-phase objects to study the relevant important physical processes. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: 6 pages, 3 figures, 1 table, submitted to A&A

arXiv:2507.01564 [pdf, ps, other]

Multi Source COVID-19 Detection via Kernel-Density-based Slice Sampling

Authors: Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

Abstract: We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to se… ▽ More We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to select eight representative slices per scan. We compare EfficientNet and Swin Transformer architectures on the validation set. The EfficientNet model achieves an F1-score of 94.68%, compared to the Swin Transformer's 93.34%. The results demonstrate the effectiveness of our KDS-based pipeline on multi-source data and highlight the importance of dataset balance in multi-institutional medical imaging evaluation. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2507.01485 [pdf, ps, other]

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun

Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), a… ▽ More Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports context-aware optimization, outperforming conventional strategies in differentiating retinal pigment epithelial cells. A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware. These results highlight the feasibility of generalizable, AI-driven laboratory automation and the transformative role of language-based reasoning in biological research. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2506.23138 [pdf, ps, other]

VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

Authors: Shiyu Wu, Mingzhen Sun, Weining Wang, Yequan Wang, Jing Liu

Abstract: Since there exists a notable gap between user-provided and model-preferred prompts, generating high-quality and satisfactory images using diffusion models often requires prompt engineering to optimize user inputs. Current studies on text-to-image prompt engineering can effectively enhance the style and aesthetics of generated images. However, they often neglect the semantic alignment between gener… ▽ More Since there exists a notable gap between user-provided and model-preferred prompts, generating high-quality and satisfactory images using diffusion models often requires prompt engineering to optimize user inputs. Current studies on text-to-image prompt engineering can effectively enhance the style and aesthetics of generated images. However, they often neglect the semantic alignment between generated images and user descriptions, resulting in visually appealing but content-wise unsatisfying outputs. In this work, we propose VisualPrompter, a novel training-free prompt engineering framework that refines user inputs to model-preferred sentences. In particular, VisualPrompter utilizes an automatic self-reflection module to identify the missing concepts in generated images and a target-specific prompt optimization mechanism to revise the prompts in a fine-grained manner. Extensive experiments demonstrate the effectiveness of our VisualPrompter, which achieves new state-of-the-art performance on multiple benchmarks for text-image alignment evaluation. Additionally, our framework features a plug-and-play design, making it highly adaptable to various generative models. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: 12 pages, 5 figures

arXiv:2506.21873 [pdf, ps, other]

Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning

Authors: Tzu-Chun Chien, Chieh-Kai Lin, Shiang-Feng Tsai, Ruei-Chi Lai, Hung-Jen Chen, Min Sun

Abstract: Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual grounding, establishing themselves as a general interface for various vision-language applications. This progress has driven the development of token pruning methods to mitigate the high computational costs associated with processing numerous visual tokens. However, we observe that pruning significantly… ▽ More Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual grounding, establishing themselves as a general interface for various vision-language applications. This progress has driven the development of token pruning methods to mitigate the high computational costs associated with processing numerous visual tokens. However, we observe that pruning significantly weakens the model's grounding ability, leading to incorrect predictions and drastic performance degradation. In Referring Expression Comprehension (REC), for instance, pruning causes the accuracy of LLaVA on the RefCOCO validation set to drop from 56.14% to 15.34%. Our analysis identifies misaligned position IDs after pruning as the primary cause of this degradation, as both the order and value of these IDs are crucial for maintaining performance in grounding tasks. To address this issue, we propose Grounding-Aware Token Pruning (GAP), a simple yet effective adjustment to position IDs that recovers REC accuracy back to 51.42%, which is 90% of the original performance in the without pruning setting, all while requiring no additional training, memory, or computational overhead. Applied to models such as Shikra, MiniGPTv2, and the LLaVA series, our method consistently improves performance across various token pruning strategies. △ Less

Submitted 26 June, 2025; originally announced June 2025.

arXiv:2506.21011 [pdf, ps, other]

Bridging Video Quality Scoring and Justification via Large Multimodal Models

Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Jiachao Gong, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

Abstract: Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core… ▽ More Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core of the approach lies in the video quality-centric instruction data. Previous explorations mainly focus on the image domain, and their data generation processes heavily rely on human quality annotations and proprietary systems, limiting data scalability and effectiveness. To address these challenges, we propose the Score-based Instruction Generation (SIG) pipeline. Specifically, SIG first scores multiple quality dimensions of an unlabeled video and maps scores to text-defined levels. It then explicitly incorporates a hierarchical Chain-of-Thought (CoT) to model the correlation between specific dimensions and overall quality, mimicking the human visual system's reasoning process. The automated pipeline eliminates the reliance on expert-written quality descriptions and proprietary systems, ensuring data scalability and generation efficiency. To this end, the resulting Score2Instruct (S2I) dataset contains over 320K diverse instruction-response pairs, laying the basis for instruction tuning. Moreover, to advance video LMMs' quality scoring and justification abilities simultaneously, we devise a progressive tuning strategy to fully unleash the power of S2I. Built upon SIG, we further curate a benchmark termed S2I-Bench with 400 open-ended questions to better evaluate the quality justification capacity of video LMMs. Experimental results on the S2I-Bench and existing benchmarks indicate that our method consistently improves quality scoring and justification capabilities across multiple video LMMs. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: 15 pages, 4 figures, 8 tables

arXiv:2506.19140 [pdf, ps, other]

Command-V: Pasting LLM Behaviors via Activation Profiles

Authors: Barry Wang, Avi Schwarzschild, Alexander Robey, Ali Payani, Charles Fleming, Mingjie Sun, Daphne Ippolito

Abstract: Retrofitting large language models (LLMs) with new behaviors typically requires full finetuning or distillation-costly steps that must be repeated for every architecture. In this work, we introduce Command-V, a backpropagation-free behavior transfer method that copies an existing residual activation adapter from a donor model and pastes its effect into a recipient model. Command-V profiles layer a… ▽ More Retrofitting large language models (LLMs) with new behaviors typically requires full finetuning or distillation-costly steps that must be repeated for every architecture. In this work, we introduce Command-V, a backpropagation-free behavior transfer method that copies an existing residual activation adapter from a donor model and pastes its effect into a recipient model. Command-V profiles layer activations on a small prompt set, derives linear converters between corresponding layers, and applies the donor intervention in the recipient's activation space. This process does not require access to the original training data and needs minimal compute. In three case studies-safety-refusal enhancement, jailbreak facilitation, and automatic chain-of-thought reasoning--Command-V matches or exceeds the performance of direct finetuning while using orders of magnitude less compute. Our code and data are accessible at https://github.com/GithuBarry/Command-V/. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.18254 [pdf, ps, other]

RLPR: Extrapolating RLVR to General Domains without Verifiers

Authors: Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key o… ▽ More Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key observation is that LLM's intrinsic probability of generating a correct free-form answer directly indicates its own evaluation of the reasoning reward (i.e., how well the reasoning process leads to the correct answer). Building on this insight, we propose RLPR, a simple verifier-free framework that extrapolates RLVR to broader general domains. RLPR uses the LLM's own token probability scores for reference answers as the reward signal and maximizes the expected reward during training. We find that addressing the high variance of this noisy probability reward is crucial to make it work, and propose prob-to-reward and stabilizing methods to ensure a precise and stable reward from LLM intrinsic probabilities. Comprehensive experiments in four general-domain benchmarks and three mathematical benchmarks show that RLPR consistently improves reasoning capabilities in both areas for Gemma, Llama, and Qwen based models. Notably, RLPR outperforms concurrent VeriFree by 7.6 points on TheoremQA and 7.5 points on Minerva, and even surpasses strong verifier-model-dependent approaches General-Reasoner by 1.6 average points across seven benchmarks. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: Project Website: https://github.com/openbmb/RLPR

arXiv:2506.18237 [pdf, ps, other]

AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

Authors: Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

Abstract: Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mech… ▽ More Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mechanisms typically rely on static length budgets or predefined rules, lacking the adaptability for varying question complexities and models' evolving capabilities. To this end, we propose AdapThink, an adaptive post-training framework designed to induce more efficient thinking while maintaining the performance of reasoning language models. Specifically, AdapThink incorporates two key mechanisms: 1) A group-relative reward function that leverages model confidence and response's characteristic to dynamically adjust the preference of reflection-related transition words without resorting to a fixed length preference. 2) A diversity-aware sampling mechanism that balances the training group's solution accuracy with reasoning diversity via an entropy-guided score. Experiments on several mathematical reasoning datasets with DeepSeek-distilled models demonstrate AdapThink's advantages in enabling adaptive reasoning patterns and mitigating the inefficiencies. △ Less

Submitted 22 June, 2025; originally announced June 2025.

arXiv:2506.17973 [pdf, ps, other]

Investigation of the neutron-proton effective mass splitting via heavy ion collisions: Constraints and Implications

Authors: Junping Yang, Meiqi Sun, Ying Cui, Yangyang Liu, Zhuxia Li, Kai Zhao, Yingxun Zhang

Abstract: The neutron-proton effective mass splitting ($Δm^*_{np}$) is investigated through analyses of heavy-ion collisions using the improved quantum molecular dynamics (ImQMD) model with both standard and extended Skyrme interactions. By uncovering the strong correlation between the slope of the neutron-to-proton yield ratio with respect to the kinetic energy (i.e., $S_{n/p} $) and $Δm^*_{np}$, we reveal… ▽ More The neutron-proton effective mass splitting ($Δm^*_{np}$) is investigated through analyses of heavy-ion collisions using the improved quantum molecular dynamics (ImQMD) model with both standard and extended Skyrme interactions. By uncovering the strong correlation between the slope of the neutron-to-proton yield ratio with respect to the kinetic energy (i.e., $S_{n/p} $) and $Δm^*_{np}$, we reveal that the constraints of the neutron-proton effective mass splitting via heavy ion collisions depend on the kinetic energy region of the emitted nucleons. At low kinetic energies, the data favor $m_n^*>m_p^*$ which is consistent with the nucleon-nucleus scattering analysis, while at high kinetic energies, they favor $m_n^*<m_p^*$. Our findings partly resolve the longstanding discrepancy in the constraints of neutron-proton effective mass splitting with heavy ion collisions and nucleon-nucleus scattering, and significantly advance the understanding of nucleon effective mass splitting through heavy ion collisions. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: 7 pages, 5 figures

arXiv:2506.17728 [pdf, ps, other]

KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation

Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo, Haofen Wang, Huajun Chen

Abstract: In this paper, we introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM). Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process in question-answering (Q&A) tasks on… ▽ More In this paper, we introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM). Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process in question-answering (Q&A) tasks on domain-specific knowledge bases (KBs) within LLMs. Following the \textbf{Logical Form} guided retrieval and reasoning technology route of KAG, this framework first decomposes complex questions into independently solvable sub-problems (which are also referred to as logical forms) through \textbf{breadth decomposition}. Each such logical form is represented in two equivalent forms-natural language and logical function-and subsequently classified as either a Knowledge Retrieval or Reasoning Analysis task. Dependencies and parameter passing between these tasks are explicitly modeled via logical function interfaces. In the solving process, the Retrieval function performs retrieval tasks. It retrieves one-hop structured and unstructured information of specified knowledge unit. While the Math and Deduce functions are used to perform reasoning analysis tasks. Secondly, it is worth noting that, in the Knowledge Retrieval sub-problem tasks, LLMs and external knowledge sources are regarded as equivalent KBs. We use the \textbf{knowledge boundary} module to determine the optimal source using self-regulatory mechanisms such as confidence calibration and reflective reasoning, and use the \textbf{depth solving} module to enhance the comprehensiveness of knowledge acquisition... △ Less

Submitted 30 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

arXiv:2506.17081 [pdf, ps, other]

Quantum droplets in rapidly rotating two-dimensional Bose-Einstein condensates

Authors: Zhen Cao, Siying Li, Zhendong Li, Xinyi Liu, Zhigang Wu, Mingyuan Sun

Abstract: Recent experiments demonstrate that rapidly rotating Bose-Einstein condensates (BECs) near the lowest Landau level can self-organize into interaction-driven persistent droplet arrays. Inspired by this discovery, we investigate the formation and dynamics of single droplet and droplet arrays in rapidly rotating BECs. Guided by a rigorous theorem on localized many-body states for 2D interacting syste… ▽ More Recent experiments demonstrate that rapidly rotating Bose-Einstein condensates (BECs) near the lowest Landau level can self-organize into interaction-driven persistent droplet arrays. Inspired by this discovery, we investigate the formation and dynamics of single droplet and droplet arrays in rapidly rotating BECs. Guided by a rigorous theorem on localized many-body states for 2D interacting systems in a magnetic field, we construct single droplet and droplet arrays states which are shown to be stationary solutions to the Gross-Pitaevskii equation in the rotating frame. The single droplet is shown to be dynamically stable, which underpins its role as the basic unit in a droplet array. The stability of the droplet arrays is demonstrated by their dynamic formation from a phase engineered initial condensate. Our study sheds light onto the nature of the droplet state in a rapidly rotating BEC and offers a new approach for generating and manipulating quantum droplet arrays through designing the initial condensate phase. △ Less

Submitted 20 June, 2025; originally announced June 2025.

Comments: 6 pages, 6 figures

arXiv:2506.16807 [pdf]

Electrochemistry-Enhanced Dynamic Paths Sampling Unveiling Nuclear Quantum Effects in Electrocatalysis

Authors: Li Fu, Yifan Li, Menglin Sun, Xiaolong Yang, Bin Jin, Shenzhen Xu

Abstract: Proton-coupled electron transfers (PCET) are elementary steps in electrocatalysis. However, accurate calculations of PCET rates remain challenging, especially considering nuclear quantum effects (NQEs) under a constant potential condition. Statistical sampling of reaction paths is an ideal approach for rate calculations, however, is always limited by the rare-event issue. Here we develop an electr… ▽ More Proton-coupled electron transfers (PCET) are elementary steps in electrocatalysis. However, accurate calculations of PCET rates remain challenging, especially considering nuclear quantum effects (NQEs) under a constant potential condition. Statistical sampling of reaction paths is an ideal approach for rate calculations, however, is always limited by the rare-event issue. Here we develop an electrochemistry-driven quantum dynamics approach enabling realistic enhanced paths sampling under constant potentials without a priori defined reaction coordinates. We apply the method in modeling the Volmer step of the hydrogen evolution reaction, and demonstrate that the NQEs exhibit more than one order of magnitude impact on the computed rate constant, indicating an essential role of NQEs in electrochemistry. △ Less

Submitted 20 June, 2025; originally announced June 2025.

arXiv:2506.14973 [pdf, ps, other]

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

Authors: Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze

Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone a… ▽ More Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone array of smart glasses to achieve directional speech recognition, source localization, and bystander cross-talk suppression. To enhance the model's ability to understand directivity, we propose two key techniques: serialized directional output training (S-DOT) and contrastive direction data augmentation (CDDA). Experimental results show that our proposed directional-SpeechLlama effectively captures the relationship between textual cues and spatial audio, yielding strong performance in both speech recognition and source localization tasks. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Accepted to Interspeech 2025

arXiv:2506.13907 [pdf, ps, other]

Extreme AGN feedback in the fossil galaxy group SDSSTG 4436

Authors: D. Eckert, F. Gastaldello, L. Lovisari, S. McGee, T. Pasini, M. Brienza, K. Kolokythas, E. O'Sullivan, A. Simionescu, M. Sun, M. Ayromlou, M. A. Bourne, Y. Chen, W. Cui, S. Ettori, A. Finoguenov, G. Gozaliasl, R. Kale, F. Mernier, B. D. Oppenheimer, G. Schellenberger, R. Seppi, E. Tempel

Abstract: Supermassive black hole feedback is the currently favoured mechanism to regulate the star formation rate of galaxies and prevent the formation of ultra-massive galaxies ($M_\star>10^{12}M_\odot$). However, the mechanism through which the outflowing energy is transferred to the surrounding medium strongly varies from one galaxy evolution model to another, such that a unified model for AGN feedback… ▽ More Supermassive black hole feedback is the currently favoured mechanism to regulate the star formation rate of galaxies and prevent the formation of ultra-massive galaxies ($M_\star>10^{12}M_\odot$). However, the mechanism through which the outflowing energy is transferred to the surrounding medium strongly varies from one galaxy evolution model to another, such that a unified model for AGN feedback does not currently exist. The hot atmospheres of galaxy groups are highly sensitive laboratories of the feedback process, as the injected black hole energy is comparable to the binding energy of halo gas particles. Here we report multi-wavelength observations of the fossil galaxy group SDSSTG 4436. The hot atmosphere of this system exhibits a highly relaxed morphology centred on the giant elliptical galaxy NGC~3298. The X-ray emission from the system features a compact core ($<$10 kpc) and a steep increase in the entropy and cooling time of the gas, with the cooling time reaching the age of the Universe $\sim15$ kpc from the centre of the galaxy. The observed entropy profile implies a total injected energy of $\sim1.5\times10^{61}$ ergs, which given the high level of relaxation could not have been injected by a recent merging event. Star formation in the central galaxy NGC~3298 is strongly quenched and its stellar population is very old ($\sim$10.6 Gyr). The currently detected radio jets have low power and are confined within the central compact core. All the available evidence implies that this system was affected by giant AGN outbursts which excessively heated the neighbouring gas and prevented the formation of a self-regulated feedback cycle. Our findings imply that AGN outbursts can be energetic enough to unbind gas particles and lead to the disruption of cool cores. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 15 pages, 10 figures, re-submitted to A&A after minor revision

arXiv:2506.13841 [pdf, ps, other]

LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi

Abstract: Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation -- leaving open the question of whether such reasonin… ▽ More Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation -- leaving open the question of whether such reasoning skills generalize to complex, real-world scenarios. In this paper, we introduce LocationReasoner, a benchmark designed to evaluate LLMs' reasoning abilities in the context of real-world site selection, where models must identify feasible locations by reasoning over diverse and complicated spatial, environmental, and logistical constraints. The benchmark comprises over 300 carefully crafted queries of varying difficulty levels, supported by a sandbox environment with in-house tools for constraint-based location search. Extensive evaluations reveal that state-of-the-art reasoning models offer limited improvement over their non-reasoning predecessors in real-world contexts, with even the latest OpenAI o4 model failing on 30% of site selection tasks. Moreover, agentic strategies such as ReAct and Reflexion often suffer from over-reasoning, leading to worse outcomes than direct code-generation prompting. With key limitations of LLMs in holistic and non-linear reasoning highlighted, we release LocationReasoner to foster the development of LLMs and agents capable of robust, grounded reasoning in real-world decision-making tasks. Codes and data for our benchmark are available at https://github.com/miho-koda/LocationReasoner. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.12411 [pdf, ps, other]

InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning

Authors: Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge

Abstract: Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities, yet their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they r… ▽ More Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities, yet their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they remain impractical due to strong assumptions about attacker knowledge or excessive clean data requirements. In this paper, we introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions, requiring neither prior knowledge of attack targets nor access to the poisoned dataset. Unlike existing defense methods that rely on the same dataset used in the poisoning stage, InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks. Specifically, InverTune first exposes attack signatures through adversarial simulation, probabilistically identifying the target label by analyzing model response patterns. Building on this, we develop a gradient inversion technique to reconstruct latent triggers through activation pattern analysis. Finally, a clustering-guided fine-tuning strategy is employed to erase the backdoor function with only a small amount of arbitrary clean data, while preserving the original model capabilities. Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks while limiting clean accuracy (CA) degradation to just 3.07%. This work establishes a new paradigm for securing multimodal systems, advancing security in foundation model deployment without compromising performance. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.09542 [pdf, ps, other]

KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs

Authors: Dingjun Wu, Yukun Yan, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

Abstract: Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing methods typically rely on a single source, either unstructured text or structured knowledge. Moreover, they lack cognitively inspired mechanisms for activating relevant knowledge. To address these issues, we propose KG-Infused RAG, a framework that integrates KGs into RAG… ▽ More Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing methods typically rely on a single source, either unstructured text or structured knowledge. Moreover, they lack cognitively inspired mechanisms for activating relevant knowledge. To address these issues, we propose KG-Infused RAG, a framework that integrates KGs into RAG systems to implement spreading activation, a cognitive process that enables concept association and inference. KG-Infused RAG retrieves KG facts, expands the query accordingly, and enhances generation by combining corpus passages with structured facts, enabling interpretable, multi-source retrieval grounded in semantic structure. We further improve KG-Infused RAG via preference learning on sampled key stages in the pipeline. Experiments on five QA benchmarks show that KG-Infused RAG consistently outperforms vanilla RAG (by 3.8% to 13.8%). Additionally, when integrated into Self-RAG, KG-Infused RAG brings further performance gains, demonstrating its effectiveness and versatility as a plug-and-play enhancement module for corpus-based RAG methods. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2506.07996 [pdf, ps, other]

UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References

Authors: Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo

Abstract: 6D object pose estimation has shown strong generalizability to novel objects. However, existing methods often require either a complete, well-reconstructed 3D model or numerous reference images that fully cover the object. Estimating 6D poses from partial references, which capture only fragments of an object's appearance and geometry, remains challenging. To address this, we propose UA-Pose, an un… ▽ More 6D object pose estimation has shown strong generalizability to novel objects. However, existing methods often require either a complete, well-reconstructed 3D model or numerous reference images that fully cover the object. Estimating 6D poses from partial references, which capture only fragments of an object's appearance and geometry, remains challenging. To address this, we propose UA-Pose, an uncertainty-aware approach for 6D object pose estimation and online object completion specifically designed for partial references. We assume access to either (1) a limited set of RGBD images with known poses or (2) a single 2D image. For the first case, we initialize a partial object 3D model based on the provided images and poses, while for the second, we use image-to-3D techniques to generate an initial object 3D model. Our method integrates uncertainty into the incomplete 3D model, distinguishing between seen and unseen regions. This uncertainty enables confidence assessment in pose estimation and guides an uncertainty-aware sampling strategy for online object completion, enhancing robustness in pose estimation accuracy and improving object completeness. We evaluate our method on the YCB-Video, YCBInEOAT, and HO3D datasets, including RGBD sequences of YCB objects manipulated by robots and human hands. Experimental results demonstrate significant performance improvements over existing methods, particularly when object observations are incomplete or partially captured. Project page: https://minfenli.github.io/UA-Pose/ △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: CVPR 2025

arXiv:2506.07955 [pdf, ps, other]

Implementation Considerations for Automated AI Grading of Student Work

Authors: Zewei, Tian, Alex Liu, Lief Esbenshade, Shawon Sarkar, Zachary Zhang, Kevin He, Min Sun

Abstract: This study explores the classroom implementation of an AI-powered grading platform in K-12 settings through a co-design pilot with 19 teachers. We combine platform usage logs, surveys, and qualitative interviews to examine how teachers use AI-generated rubrics and grading feedback. Findings reveal that while teachers valued the AI's rapid narrative feedback for formative purposes, they distrusted… ▽ More This study explores the classroom implementation of an AI-powered grading platform in K-12 settings through a co-design pilot with 19 teachers. We combine platform usage logs, surveys, and qualitative interviews to examine how teachers use AI-generated rubrics and grading feedback. Findings reveal that while teachers valued the AI's rapid narrative feedback for formative purposes, they distrusted automated scoring and emphasized the need for human oversight. Students welcomed fast, revision-oriented feedback but remained skeptical of AI-only grading. We discuss implications for the design of trustworthy, teacher-centered AI assessment tools that enhance feedback while preserving pedagogical agency. △ Less

Submitted 17 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.07900 [pdf, ps, other]

MiniCPM4: Ultra-Efficient LLMs on End Devices

Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: MiniCPM4 Technical Report

arXiv:2506.07657 [pdf, ps, other]

PIG: Physically-based Multi-Material Interaction with 3D Gaussians

Authors: Zeyu Xiao, Zhenyi Wu, Mingyang Sun, Qipeng Yan, Yufan Guo, Zhuoer Liang, Lihua Zhang

Abstract: 3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Inter… ▽ More 3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Interaction with 3D Gaussians, a novel approach that combines 3D object segmentation with the simulation of interacting objects in high precision. Firstly, our method facilitates fast and accurate mapping from 2D pixels to 3D Gaussians, enabling precise 3D object-level segmentation. Secondly, we assign unique physical properties to correspondingly segmented objects within the scene for multi-material coupled interactions. Finally, we have successfully embedded constraint scales into deformation gradients, specifically clamping the scaling and rotation properties of the Gaussian primitives to eliminate artifacts and achieve geometric fidelity and visual consistency. Experimental results demonstrate that our method not only outperforms the state-of-the-art (SOTA) in terms of visual quality, but also opens up new directions and pipelines for the field of physically realistic scene generation. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.07262 [pdf, ps, other]

doi 10.3847/1538-4357/adde47

ALMA-JELLY I: High Resolution CO(2-1) Observations of Ongoing Ram Pressure Stripping in NGC 4858 Reveal Asymmetrical Gas Tail Formation and Fallback

Authors: Harrison J. Souchereau, Jeffrey D. P. Kenney, Pavel Jachym, Ming Sun, William J. Cramer, Masafumi Yagi, Alessandro Boselli, Elias Brinks, Francoise Combes, Luca Cortese, Boris Deshev, Matteo Fossati, Romana Grossova, Rongxin Luo, Jan Palous, Tom C. Scott

Abstract: We present new CO(2-1) observations (resolution $\sim1" = 460$pc) of the Coma cluster jellyfish galaxy NGC 4858 obtained from the ALMA-JELLY large program. Analyzing this data alongside complimentary Subaru H$α$ and HST (F600LP / F350LP) observations, we find numerous structural and kinematic features indicative of the effects from strong, inclined ram pressure, including an asymmetric inner gas t… ▽ More We present new CO(2-1) observations (resolution $\sim1" = 460$pc) of the Coma cluster jellyfish galaxy NGC 4858 obtained from the ALMA-JELLY large program. Analyzing this data alongside complimentary Subaru H$α$ and HST (F600LP / F350LP) observations, we find numerous structural and kinematic features indicative of the effects from strong, inclined ram pressure, including an asymmetric inner gas tail. We estimate a highly-inclined disk-wind angle of $φ_{DW} = 75^{+10}_{-27}$. By subtracting a simple circular velocity model, we find (1): gas clumps that are being accelerated by ram pressure, and (2): signatures of gas clumps that had been previously pushed out of the disk but are now falling inwards. We also discuss head-tail morphologies in star complexes within the stellar disk that appear to be RPS-influenced. Lastly, we compare this galaxy to state-of-the-art galaxy ``wind tunnel'' simulations. We find that this galaxy is one of the best nearby examples of strong and inclined ram pressure gas stripping, and of gas that is perturbed by ram pressure but not fully stripped and falls back. We emphasize the importance of torques due to ram pressure in highly-inclined interactions, which help drive gas inwards on the side rotating against the wind, contributing to the formation of asymmetric inner RPS tails. △ Less

Submitted 8 June, 2025; originally announced June 2025.

arXiv:2506.04909 [pdf, ps, other]

When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

Authors: Kai Wang, Yihao Zhang, Meng Sun

Abstract: The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models' explicit thought paths enable us to study strategic deception--goal-driven, intentional misinfor… ▽ More The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models' explicit thought paths enable us to study strategic deception--goal-driven, intentional misinformation where reasoning contradicts outputs. Using representation engineering, we systematically induce, detect, and control such deception in CoT-enabled LLMs, extracting "deception vectors" via Linear Artificial Tomography (LAT) for 89% detection accuracy. Through activation steering, we achieve a 40% success rate in eliciting context-appropriate deception without explicit prompts, unveiling the specific honesty-related issue of reasoning models and providing tools for trustworthy AI alignment. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2506.04757 [pdf, ps, other]

doi 10.1051/0004-6361/202553977

Modelling the selection of galaxy groups with end to end simulations

Authors: R. Seppi, D. Eckert, A. Finoguenov, S . Shreeram, E. Tempel, G. Gozaliasl, M. Lorenz, J. Wilms, G. A. Mamon, F. Gastaldello, L. Lovisari, E. O'Sullivan, K. Kolokythas, M. A. Bourne, M. Sun, A. Pillepich

Abstract: Feedback from supernovae and AGN shapes galaxy formation and evolution, yet its impact remains unclear. Galaxy groups offer a crucial probe, as their binding energy is comparable to that available from their central AGN. The XMM-Newton Group AGN Project (X-GAP) is a sample of 49 groups selected in X-ray (ROSAT) and optical (SDSS) bands, providing a benchmark for hydrodynamical simulations. In sigh… ▽ More Feedback from supernovae and AGN shapes galaxy formation and evolution, yet its impact remains unclear. Galaxy groups offer a crucial probe, as their binding energy is comparable to that available from their central AGN. The XMM-Newton Group AGN Project (X-GAP) is a sample of 49 groups selected in X-ray (ROSAT) and optical (SDSS) bands, providing a benchmark for hydrodynamical simulations. In sight of such a comparison, understanding selection effects is essential. We aim to model the selection function of X-GAP by forward modelling the detection process in the X-ray and optical bands. Using the Uchuu simulation, we build a halo light cone, predict X-ray group properties with a neural network trained on hydro simulations, and assign galaxies matching observed properties. We compare the selected sample to the parent population. Our method provides a sample that matches the observed distribution of X-ray luminosity and velocity dispersion. The 50% completeness is reached at a velocity dispersion of 450 km/s in the X-GAP redshift range. The selection is driven by X-ray flux, with secondary dependence on velocity dispersion and redshift. We estimate a 93% purity level in the X-GAP parent sample. We calibrate the velocity dispersion-halo mass relation. We find a normalisation and slope in agreement with the literature, and an intrinsic scatter of about 0.06 dex. The measured velocity dispersion is accurate within 10% only for rich systems with more than about 20 members, while the velocity dispersion for groups with less than 10 members is biased at more than 20%. The X-ray follow-up refines the optical selection, enhancing purity but reducing completeness. In an SDSS-like setup, velocity dispersion measurement errors dominate over intrinsic scatter. Our selection model will enable the comparisons of thermodynamic properties and gas fractions between X-GAP groups and hydro simulations. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: Accepted for publication on A&A

Journal ref: A&A 699, A206 (2025)

arXiv:2506.04329 [pdf, ps, other]

Estimating Bolometric Luminosities of Type 1 Quasars with Self-Organizing Maps

Authors: Jie Chen, Linhua Jiang, Shengxiu Sun, Zijian Zhang, Mouyuan Sun

Abstract: We present a new method to calculate bolometric luminosities for unobscured, type 1 quasars with multi-band photometric data. Bolometric luminosity is a fundamental property to understand quasars and it is commonly estimated from monochromatic luminosities using bolometric corrections that often neglect quasar SED diversity. We take advantage of the fact that most quasars now have multi-band obser… ▽ More We present a new method to calculate bolometric luminosities for unobscured, type 1 quasars with multi-band photometric data. Bolometric luminosity is a fundamental property to understand quasars and it is commonly estimated from monochromatic luminosities using bolometric corrections that often neglect quasar SED diversity. We take advantage of the fact that most quasars now have multi-band observations from UV to mid-IR, and construct SEDs for a well-defined sample of SDSS quasars at $0.5\leq z\leq 2$. Based on this fiducial sample, we explore quasar SEDs, their diversity, and their relations with bolometric luminosities. We then use unsupervised neural network self-organizing maps (SOM) to describe the SED diversity and compute the bolometric luminosities with a fully-trained SOM model. This method reduces systematical uncertainties compared to the traditional method. In addition, we update the multi-linear regression relations between bolometric luminosity and monochromatic luminosities at restframe 1450Å, 3000Å, and 5100Å. Our method is applicable to large quasar samples with a wide range of luminosity and redshift. We have applied it to the SDSS DR16 quasars. We have also made our code publicly available. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 18 pages, 13 figures. Resubmitted to ApJ based on reviewer report. Code QSOLbol is available at this https URL https://github.com/ChenJiemi/QSOLbol

arXiv:2506.03770 [pdf, ps, other]

Multiuser Beamforming for Pinching-Antenna Systems: An Element-wise Optimization Framework

Authors: Mingjun Sun, Chongjun Ouyang, Shaochuan Wu, Yuanwei Liu

Abstract: The pinching-antenna system (PASS) reconstructs wireless channels through pinching beamforming, i.e., optimizing the activated locations of pinching antennas (PAs) along the waveguide. The aim of this article is to investigate the joint design of baseband beamforming and pinching beamforming. A low-complexity element-wise sequential optimization framework is proposed to address the sum-rate maximi… ▽ More The pinching-antenna system (PASS) reconstructs wireless channels through pinching beamforming, i.e., optimizing the activated locations of pinching antennas (PAs) along the waveguide. The aim of this article is to investigate the joint design of baseband beamforming and pinching beamforming. A low-complexity element-wise sequential optimization framework is proposed to address the sum-rate maximization problem in PASS-enabled downlink and uplink channels. i) For the downlink scenario, maximum ratio transmission (MRT), zero-forcing (ZF), and minimum mean square error (MMSE) beamforming schemes are employed as baseband beamformers. For each beamformer, a closed-form expression for the downlink sum-rate is derived as a single-variable function with respect to the pinching beamformer. Based on this, a sequential optimization method is proposed, where the positions of the PAs are updated element-wise using a low-complexity one-dimensional search. ii) For the uplink scenario, signal detection is performed using maximum ratio combining (MRC), ZF, and MMSE combiners. A closed-form sum-rate expression is derived for each linear combiner, and a similar element-wise design is applied to optimize the pinching beamforming. Numerical results are provided to validate the effectiveness of the proposed method and demonstrate that: (i) For all considered linear beamformers, the proposed PASS architecture outperforms conventional fixed-antenna systems in terms of sum-rate performance; (ii) in both downlink and uplink channels, ZF achieves performance close to that of MMSE and significantly outperforms MRT or MRC; and (iii) the proposed element-wise design eliminates the need for alternating updates between the baseband and pinching beamformers, thereby ensuring low computational complexity. △ Less

Submitted 7 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.02522 [pdf, ps, other]

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun

Abstract: Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propo… ▽ More Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propose Agents Co-Evolution (ACE), a synergistic framework between LLMs and RL agents for large-scale decision-making scenarios. ACE introduces a dual-role trajectory refinement mechanism where LLMs act as both Policy Actor and Value Critic during RL's training: the Actor refines suboptimal actions via multi-step reasoning and environment validation, while the Critic performs temporal credit assignment through trajectory-level reward shaping. Concurrently, RL agent enhances LLMs' task-specific decision-making with high-quality fine-tuning datasets generated via prioritized experience replay. Through extensive experiments across multiple power grid operation challenges with action spaces exceeding 60K discrete actions, ACE demonstrates superior performance over existing RL methods and LLM-based methods. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2506.02503 [pdf, ps, other]

KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

Authors: Yongjian Li, HaoCheng Chu, Yukun Yan, Zhenghao Liu, Shi Yu, Zheni Zeng, Ruobing Wang, Sen Song, Zhiyuan Liu, Maosong Sun

Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware… ▽ More Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware Refinement and Enhancement for RAG), which improves knowledge utilization through three key innovations: (1) structured knowledge representations that facilitate error detection during training, (2) Dense Direct Preference Optimization (DDPO)-a refined training objective that prioritizes correction of critical errors, and (3) a contrastive data generation pipeline that maintains semantic consistency while rectifying factual inaccuracies. Experiments show our method significantly enhances standard RAG pipelines across model scales, improving both in-domain and out-of-domain task performance without compromising general capabilities. Notably, these gains are achieved with modest training data, suggesting data-efficient optimization is possible through targeted learning strategies. Our findings establish a new direction for RAG improvement: by improving how models learn to process retrieved content, we can enhance performance across diverse inference paradigms. All data and code will be publicly available on Github. △ Less

Submitted 3 June, 2025; originally announced June 2025.

arXiv:2506.01947 [pdf, ps, other]

RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report

Authors: Marcos V. Conde, Radu Timofte, Radu Berdan, Beril Besbinar, Daisuke Iso, Pengzhou Ji, Xiong Dun, Zeying Fan, Chen Wu, Zhansheng Wang, Pengbo Zhang, Jiazi Huang, Qinglin Liu, Wei Yu, Shengping Zhang, Xiangyang Ji, Kyungsik Kim, Minkyung Kim, Hwalmin Lee, Hekun Ma, Huan Zheng, Yanyan Wei, Zhao Zhang, Jing Fang, Meilin Gao , et al. (8 additional authors not shown)

Abstract: Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW… ▽ More Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW Reconstruction from sRGB (Reverse ISP). We aim to recover RAW sensor images from smartphones given the corresponding sRGB images without metadata and, by doing this, ``reverse" the ISP transformation. Over 150 participants joined this NTIRE 2025 challenge and submitted efficient models. The proposed methods and benchmark establish the state-of-the-art for generating realistic RAW data. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

arXiv:2506.01770 [pdf, ps, other]

ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs

Authors: Zeming Wei, Chengcan Wu, Meng Sun

Abstract: Large Language Models (LLMs) have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models, model-based analysis has demonstrated notable potential in stateful deep neural networks, yet suffers from s… ▽ More Large Language Models (LLMs) have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models, model-based analysis has demonstrated notable potential in stateful deep neural networks, yet suffers from scalability issues when extending to LLMs due to their vast feature spaces. In this paper, we propose ReGA, a model-based analysis framework with representation-guided abstraction, to safeguard LLMs against harmful prompts and generations. By leveraging safety-critical representations, which are low-dimensional directions emerging in hidden states that indicate safety-related concepts, ReGA effectively addresses the scalability issue when constructing the abstract model for safety modeling. Our comprehensive evaluation shows that ReGA performs sufficiently well in distinguishing between safe and harmful inputs, achieving an AUROC of 0.975 at the prompt level and 0.985 at the conversation level. Additionally, ReGA exhibits robustness to real-world attacks and generalization across different safety perspectives, outperforming existing safeguard paradigms in terms of interpretability and scalability. Overall, ReGA serves as an efficient and scalable solution to enhance LLM safety by integrating representation engineering with model-based abstraction, paving the way for new paradigms to utilize software insights for AI safety. Our code is available at https://github.com/weizeming/ReGA. △ Less

Submitted 2 June, 2025; originally announced June 2025.

arXiv:2506.01391 [pdf, ps, other]

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun

Abstract: The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, whi… ▽ More The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, which hinders the learning of precise grounding and planning. Models trained purely by imitation tend to overfit to seen interface patterns and fail to generalize in unfamiliar scenarios. Moreover, most prior work focuses on English interfaces while overlooks the growing diversity of non-English applications such as those in the Chinese mobile ecosystem. In this work, we present AgentCPM-GUI, an 8B-parameter GUI agent built for robust and efficient on-device GUI interaction. Our training pipeline includes grounding-aware pre-training to enhance perception, supervised fine-tuning on high-quality Chinese and English trajectories to imitate human-like actions, and reinforcement fine-tuning with GRPO to improve reasoning capability. We also introduce a compact action space that reduces output length and supports low-latency execution on mobile devices. AgentCPM-GUI achieves state-of-the-art performance on five public benchmarks and a new Chinese GUI benchmark called CAGUI, reaching $96.9\%$ Type-Match and $91.3\%$ Exact-Match. To facilitate reproducibility and further research, we publicly release all code, model checkpoint, and evaluation data. △ Less

Submitted 16 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

Comments: Updated results in Table 2 and Table 3; The project is available at https://github.com/OpenBMB/AgentCPM-GUI

ACM Class: I.2.8; I.2.7; I.2.10; H.5.2

arXiv:2505.24550 [pdf, ps, other]

A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

Authors: Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He

Abstract: Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Though… ▽ More Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Thought, an efficient tree search-based unified framework designed to identify and isolate the most essential thoughts from the extensive reasoning chains produced by these models. It formulates the reasoning process of LRMs as a search tree, where each node represents a reasoning span in the giant reasoning space. By combining the A* search algorithm with a cost function specific to the reasoning path, it can efficiently compress the chain of thought and determine a reasoning path with high information density and low cost. In addition, we also propose a bidirectional importance estimation mechanism, which further refines this search process and enhances its efficiency beyond uniform sampling. Extensive experiments on several advanced math tasks show that A*-Thought effectively balances performance and efficiency over a huge search space. Specifically, A*-Thought can improve the performance of QwQ-32B by 2.39$\times$ with low-budget and reduce the length of the output token by nearly 50% with high-budget. The proposed method is also compatible with several other LRMs, demonstrating its generalization capability. The code can be accessed at: https://github.com/AI9Stars/AStar-Thought. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2505.24388 [pdf, other]

ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

Authors: Hao Chen, Yukun Yan, Sen Mei, Wanxiang Che, Zhenghao Liu, Qi Shi, Xinze Li, Yuchun Fan, Pengcheng Huang, Qiushi Xiong, Zhiyuan Liu, Maosong Sun

Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To add… ▽ More Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To address this issue, we propose ClueAnchor, a novel framework for enhancing RAG via clue-anchored reasoning exploration and optimization. ClueAnchor extracts key clues from retrieved content and generates multiple reasoning paths based on different knowledge configurations, optimizing the model by selecting the most effective one through reward-based preference optimization. Experiments show that ClueAnchor significantly outperforms prior RAG baselines in reasoning completeness and robustness. Further analysis confirms its strong resilience to noisy or partially relevant retrieved content, as well as its capability to identify supporting evidence even in the absence of explicit clue supervision during inference. △ Less

Submitted 30 May, 2025; originally announced May 2025.

arXiv:2505.23187 [pdf, ps, other]

Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration

Authors: Yilong Li, Chen Qian, Yu Xia, Ruijie Shi, Yufan Dang, Zihao Xie, Ziming You, Weize Chen, Cheng Yang, Weichuan Liu, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun

Abstract: Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential… ▽ More Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential learning (MAEL), a novel framework that endows LLM-driven agents with explicit cross-task learning and experience accumulation. We model the task-solving workflow on a graph-structured multi-agent collaboration network, where agents propagate information and coordinate via explicit connectivity. During the experiential learning phase, we quantify the quality for each step in the task-solving workflow and store the resulting rewards along with the corresponding inputs and outputs into each agent's individual experience pool. During inference, agents retrieve high-reward, task-relevant experiences as few-shot examples to enhance the effectiveness of each reasoning step, thereby enabling more accurate and efficient multi-agent collaboration. Experimental results on diverse datasets demonstrate that MAEL empowers agents to learn from prior task experiences effectively-achieving faster convergence and producing higher-quality solutions on current tasks. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: Work in Progress

arXiv:2505.23151 [pdf, ps, other]

A Be star-black hole binary with a wide orbit from LAMOST time-domain survey

Authors: Qian-Yu An, Yang Huang, Wei-Min Gu, Yong Shao, Zhi-Xiang Zhang, Tuan Yi, B. D. Lailey, T. A. A. Sigut, Kyle Akira Rocha, Meng Sun, Seth Gossage, Shi-Jie Gao, Shan-Shan Weng, Song Wang, Bowen Zhang, Xinlin Zhao, Senyu Qi, Shilong Liao, Jianghui Ji, Junfeng Wang, Jianfeng Wu, Mouyuan Sun, Xiang-Dong Li, Jifeng Liu

Abstract: Binary systems consisting of an early type star and a black hole (BH) are crucial for understanding various astrophysical phenomena, particularly the origins of detected gravitational wave sources. Be binary systems are expected to represent a key evolutionary stage in hosting BHs. However, while hundreds of Be X-ray binaries are known, the only confirmed BH candidate in a Be binary remains highly… ▽ More Binary systems consisting of an early type star and a black hole (BH) are crucial for understanding various astrophysical phenomena, particularly the origins of detected gravitational wave sources. Be binary systems are expected to represent a key evolutionary stage in hosting BHs. However, while hundreds of Be X-ray binaries are known, the only confirmed BH candidate in a Be binary remains highly controversial. We report the discovery of ALS 8814, a Be star-BH binary with a moderately eccentric ($e = 0.23$) and wide orbit ($P = 176.6$ days), revealed by the radial velocity (RV) measurement of the visible Be star. Our analysis, combining flux-calibrated spectra in the Balmer discontinuity region and spectral template matching, yields a mass of $11.2^{+1.4}_{-1.2}$ $M_\odot$ for the Be star. The minimum mass of the unseen companion, assuming an edge-on inclination ($i = 90^{\circ}$), is $9.8\pm 0.7\,M_\odot$. We rule out the presence of non-degenerate companions in ALS 8814, indicating that it can only be a BH. This discovery represents a robust case of a Be-BH binary, identified purely through precise RV measurements from a single set of lines. The extremely low peculiar velocity of ALS 8814 suggests that the BH is formed via a direct core-collapse with a negligible natal kick, implying an almost perfect alignment between the Be star's spin and the orbital plane. In this context, the binary's inclination angle is estimated to be 22$^{\circ}$-49$^{\circ}$ by analyzing the shallow double-peaked profile of the H$α$ emission line. This inclination range corresponds to a BH mass estimate between $15\,M_\odot$ and $58\,M_\odot$. As the only unambiguous Be-BH binary system known to date, ALS 8814 provides valuable constraints on the BH formation in a binary system with a high-mass companion. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 76 pages, 29 figures, to be submitted

arXiv:2505.23079 [pdf, ps, other]

iTrace : Interactive Tracing of Cross-View Data Relationships

Authors: Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, Jian Zhao, David Koop

Abstract: Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing and linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scatt… ▽ More Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing and linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scattered, can be difficult due to spatial distance and complexity. To address this, we present iTrace, an interactive visualization technique to effectively trace cross-view data relationships. iTrace leverages the concept of interactive focus transitions, which allows users to see and directly manipulate their focus as they navigate between views. By directing the user's attention through smooth transitions between related elements, iTrace makes it easier to follow data relationships. We demonstrate the effectiveness of iTrace with a user study, and we conclude with a discussion of how iTrace can be broadly used to enhance data exploration in various types of visualizations. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 13 pages, 14 figures, accepted to Graphics Interface 2025

MSC Class: 68U05 ACM Class: H.5.2; I.3.6; I.3.8

arXiv:2505.22949 [pdf, ps, other]

Directed Graph Grammars for Sequence-based Learning

Authors: Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

Abstract: Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approac… ▽ More Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approach to constructing a principled, compact and equivalent sequential representation of a DAG. Specifically, we view a graph as derivations over an unambiguous grammar, where the DAG corresponds to a unique sequence of production rules. Equivalently, the procedure to construct such a description can be viewed as a lossless compression of the data. Such a representation has many uses, including building a generative model for graph generation, learning a latent space for property prediction, and leveraging the sequence representational continuity for Bayesian Optimization over structured data. Code is available at https://github.com/shiningsunnyday/induction. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: ICML 2025

arXiv:2505.22948 [pdf, ps, other]

Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

Authors: Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

Abstract: Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By… ▽ More Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By exploiting the chemical knowledge of an MMFM, FMG renders molecules as images, describes them as text, and aligns information across modalities using prompt learning. FMG can be used as a drop-in replacement for the prior grammar learning approaches in molecular generation and property prediction. We show that FMG not only excels in synthesizability, diversity, and data efficiency but also offers built-in chemical interpretability for automated molecular discovery workflows. Code is available at https://github.com/shiningsunnyday/induction. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: ICML 2025

arXiv:2505.22787 [pdf, ps, other]

Can Large Language Models Match the Conclusions of Systematic Reviews?

Authors: Christopher Polzak, Alejandro Lozano, Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy

Abstract: Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs… ▽ More Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs to critically assess evidence and reason across multiple documents to provide recommendations at the same proficiency as domain experts remains poorly characterized. We therefore ask: Can LLMs match the conclusions of systematic reviews written by clinical experts when given access to the same studies? To explore this question, we present MedEvidence, a benchmark pairing findings from 100 SRs with the studies they are based on. We benchmark 24 LLMs on MedEvidence, including reasoning, non-reasoning, medical specialist, and models across varying sizes (from 7B-700B). Through our systematic evaluation, we find that reasoning does not necessarily improve performance, larger models do not consistently yield greater gains, and knowledge-based fine-tuning degrades accuracy on MedEvidence. Instead, most models exhibit similar behavior: performance tends to degrade as token length increases, their responses show overconfidence, and, contrary to human experts, all models show a lack of scientific skepticism toward low-quality findings. These results suggest that more work is still required before LLMs can reliably match the observations from expert-conducted SRs, even though these systems are already deployed and being used by clinicians. We release our codebase and benchmark to the broader research community to further investigate LLM-based SR systems. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.22445 [pdf, other]

NFR: Neural Feature-Guided Non-Rigid Shape Registration

Authors: Puhua Jiang, Zhangquan Chen, Mingze Sun, Ruqi Huang

Abstract: In this paper, we propose a novel learning-based framework for 3D shape registration, which overcomes the challenges of significant non-rigid deformation and partiality undergoing among input shapes, and, remarkably, requires no correspondence annotation during training. Our key insight is to incorporate neural features learned by deep learning-based shape matching networks into an iterative, geom… ▽ More In this paper, we propose a novel learning-based framework for 3D shape registration, which overcomes the challenges of significant non-rigid deformation and partiality undergoing among input shapes, and, remarkably, requires no correspondence annotation during training. Our key insight is to incorporate neural features learned by deep learning-based shape matching networks into an iterative, geometric shape registration pipeline. The advantage of our approach is two-fold -- On one hand, neural features provide more accurate and semantically meaningful correspondence estimation than spatial features (e.g., coordinates), which is critical in the presence of large non-rigid deformations; On the other hand, the correspondences are dynamically updated according to the intermediate registrations and filtered by consistency prior, which prominently robustify the overall pipeline. Empirical results show that, with as few as dozens of training shapes of limited variability, our pipeline achieves state-of-the-art results on several benchmarks of non-rigid point cloud matching and partial shape matching across varying settings, but also delivers high-quality correspondences between unseen challenging shape pairs that undergo both significant extrinsic and intrinsic deformations, in which case neither traditional registration methods nor intrinsic methods work. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: 20 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2311.04494

ACM Class: I.4.m; I.2.6

arXiv:2505.22131 [pdf, other]

EULER: Enhancing the Reasoning Ability of Large Language Models through Error-Induced Learning

Authors: Zhuoyang Wu, Xinze Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Minghe Yu, Cheng Yang, Yu Gu, Ge Yu, Maosong Sun

Abstract: Large Language Models (LLMs) have demonstrated strong reasoning capabilities and achieved promising results in mathematical problem-solving tasks. Learning from errors offers the potential to further enhance the performance of LLMs during Supervised Fine-Tuning (SFT). However, the errors in synthesized solutions are typically gathered from sampling trails, making it challenging to generate solutio… ▽ More Large Language Models (LLMs) have demonstrated strong reasoning capabilities and achieved promising results in mathematical problem-solving tasks. Learning from errors offers the potential to further enhance the performance of LLMs during Supervised Fine-Tuning (SFT). However, the errors in synthesized solutions are typically gathered from sampling trails, making it challenging to generate solution errors for each mathematical problem. This paper introduces the Error-IndUced LEaRning (EULER) model, which aims to develop an error exposure model that generates high-quality solution errors to enhance the mathematical reasoning capabilities of LLMs. Specifically, EULER optimizes the error exposure model to increase the generation probability of self-made solution errors while utilizing solutions produced by a superior LLM to regularize the generation quality. Our experiments across various mathematical problem datasets demonstrate the effectiveness of the EULER model, achieving an improvement of over 4% compared to all baseline models. Further analysis reveals that EULER is capable of synthesizing more challenging and educational solution errors, which facilitate both the training and inference processes of LLMs. All codes are available at https://github.com/NEUIR/EULER. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.22095 [pdf, ps, other]

Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Authors: Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu, Minghe Yu, Ge Yu, Maosong Sun

Abstract: Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook th… ▽ More Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook the reasoning and planning capabilities of MLLMs to dynamically determine how to interact with different KBs during the reasoning process. To address this limitation, we propose R1-Router, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state. Specifically, R1-Router can generate follow-up queries according to the current reasoning step, routing these intermediate queries to the most suitable KB, and integrating external knowledge into a coherent reasoning trajectory to answer the original query. Furthermore, we introduce Step-wise Group Relative Policy Optimization (Step-GRPO), a tailored reinforcement learning algorithm that assigns step-specific rewards to optimize the reasoning behavior of MLLMs. Experimental results on various open-domain QA benchmarks across multiple modalities demonstrate that R1-Router outperforms baseline models by over 7%. Further analysis shows that R1-Router can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Showing 1–50 of 1,787 results for author: Sun, M