Skip to main content

Showing 1–50 of 1,027 results for author: Sun, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05687  [pdf, ps, other

    cs.LG cs.CL

    AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

    Authors: Shangzhan Li, Zefan Wang, Ye He, Yuxuan Li, Qi Shi, Jianling Li, Yonggang Hu, Wanxiang Che, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and mem… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2507.05685  [pdf, ps, other

    cs.LG cs.AI

    Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach

    Authors: Xiaobing Chen, Boyang Zhang, Xiangwei Zhou, Mingxuan Sun, Shuai Zhang, Songyang Zhang, Geoffrey Ye Li

    Abstract: The integration of Federated Learning (FL) and Mixture-of-Experts (MoE) presents a compelling pathway for training more powerful, large-scale artificial intelligence models (LAMs) on decentralized data while preserving privacy. However, efficient federated training of these complex MoE-structured LAMs is hindered by significant system-level challenges, particularly in managing the interplay betwee… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 7 pages

  3. arXiv:2507.03280  [pdf, ps, other

    cs.IR

    Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation

    Authors: Dong Zhang, Lin Li, Ming Li, Xiaohui Tao, Meng Sun, Jimmy Xiangji Huang

    Abstract: Existing solutions for bundle recommendation(BR) have achieved remarkable effectiveness for predicting the user's preference for prebuilt bundles. However, bundle-item(B-I) affiliation will vary dynamically in real scenarios. For example, a bundle themed as 'casual outfit', may add 'hat' or remove 'watch' due to factors such as seasonal variations, changes in user pes or inventory adjustments. Our… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  4. arXiv:2507.01564  [pdf, ps, other

    eess.IV cs.CV

    Multi Source COVID-19 Detection via Kernel-Density-based Slice Sampling

    Authors: Chia-Ming Lee, Bo-Cheng Qiu, Ting-Yao Chen, Ming-Han Sun, Fang-Ying Lin, Jung-Tse Tsai, I-An Tsai, Yu-Fan Lin, Chih-Chung Hsu

    Abstract: We present our solution for the Multi-Source COVID-19 Detection Challenge, which classifies chest CT scans from four distinct medical centers. To address multi-source variability, we employ the Spatial-Slice Feature Learning (SSFL) framework with Kernel-Density-based Slice Sampling (KDS). Our preprocessing pipeline combines lung region extraction, quality control, and adaptive slice sampling to se… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  5. arXiv:2507.01485  [pdf, ps, other

    cs.RO cs.AI cs.MA q-bio.QM

    BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

    Authors: Yibo Qiu, Zan Huang, Zhiyu Wang, Handi Liu, Yiling Qiao, Yifeng Hu, Shu'ang Sun, Hangke Peng, Ronald X Xu, Mingzhai Sun

    Abstract: Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), a… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  6. arXiv:2506.23138  [pdf, ps, other

    cs.CV

    VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

    Authors: Shiyu Wu, Mingzhen Sun, Weining Wang, Yequan Wang, Jing Liu

    Abstract: Since there exists a notable gap between user-provided and model-preferred prompts, generating high-quality and satisfactory images using diffusion models often requires prompt engineering to optimize user inputs. Current studies on text-to-image prompt engineering can effectively enhance the style and aesthetics of generated images. However, they often neglect the semantic alignment between gener… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 12 pages, 5 figures

  7. arXiv:2506.21873  [pdf, ps, other

    cs.CV cs.AI

    Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning

    Authors: Tzu-Chun Chien, Chieh-Kai Lin, Shiang-Feng Tsai, Ruei-Chi Lai, Hung-Jen Chen, Min Sun

    Abstract: Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual grounding, establishing themselves as a general interface for various vision-language applications. This progress has driven the development of token pruning methods to mitigate the high computational costs associated with processing numerous visual tokens. However, we observe that pruning significantly… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  8. arXiv:2506.21011  [pdf, ps, other

    cs.CV

    Bridging Video Quality Scoring and Justification via Large Multimodal Models

    Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Jiachao Gong, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

    Abstract: Classical video quality assessment (VQA) methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the linguistic output, adapting video large multimodal models (LMMs) to VQA via instruction tuning has the potential to address this issue. The core… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 15 pages, 4 figures, 8 tables

  9. arXiv:2506.19140  [pdf, ps, other

    cs.LG

    Command-V: Pasting LLM Behaviors via Activation Profiles

    Authors: Barry Wang, Avi Schwarzschild, Alexander Robey, Ali Payani, Charles Fleming, Mingjie Sun, Daphne Ippolito

    Abstract: Retrofitting large language models (LLMs) with new behaviors typically requires full finetuning or distillation-costly steps that must be repeated for every architecture. In this work, we introduce Command-V, a backpropagation-free behavior transfer method that copies an existing residual activation adapter from a donor model and pastes its effect into a recipient model. Command-V profiles layer a… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  10. arXiv:2506.18254  [pdf, ps, other

    cs.LG cs.AI cs.CL

    RLPR: Extrapolating RLVR to General Domains without Verifiers

    Authors: Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key o… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Website: https://github.com/openbmb/RLPR

  11. arXiv:2506.18237  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AdapThink: Adaptive Thinking Preferences for Reasoning Language Model

    Authors: Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun

    Abstract: Reinforcement Learning (RL)-based post-training has significantly advanced the complex reasoning capabilities of language models, fostering sophisticated self-reflection processes. However, this ``slow thinking'' paradigm presents a critical challenge to reasoning efficiency: models may expend excessive computation on simple questions and shift reasoning prematurely for complex ones. Previous mech… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  12. arXiv:2506.17728  [pdf, ps, other

    cs.CL cs.AI

    KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation

    Authors: Dalong Zhang, Jun Xu, Jun Zhou, Lei Liang, Lin Yuan, Ling Zhong, Mengshu Sun, Peilong Zhao, QiWei Wang, Xiaorui Wang, Xinkai Du, YangYang Hou, Yu Ao, ZhaoYang Wang, Zhengke Gui, ZhiYing Yi, Zhongpu Bo, Haofen Wang, Huajun Chen

    Abstract: In this paper, we introduce KAG-Thinker, which upgrade KAG to a multi-turn interactive thinking and deep reasoning framework powered by a dedicated parameter-light large language model (LLM). Our approach constructs a structured thinking process for solving complex problems, enhancing the the logical coherence and contextual consistency of the reasoning process in question-answering (Q&A) tasks on… ▽ More

    Submitted 30 June, 2025; v1 submitted 21 June, 2025; originally announced June 2025.

  13. arXiv:2506.14973  [pdf, ps, other

    eess.AS cs.AI

    Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition

    Authors: Jiamin Xie, Ju Lin, Yiteng Huang, Tyler Vuong, Zhaojiang Lin, Zhaojun Yang, Peng Su, Prashant Rawat, Sangeeta Srivastava, Ming Sun, Florian Metze

    Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech recognition capabilities. However, the ability of Speech LLMs to comprehend and process multi-channel audio with spatial cues remains a relatively uninvestigated area of research. In this work, we present directional-SpeechLlama, a novel approach that leverages the microphone a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  14. arXiv:2506.13841  [pdf, ps, other

    cs.AI

    LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning

    Authors: Miho Koda, Yu Zheng, Ruixian Ma, Mingyang Sun, Devesh Pansare, Fabio Duarte, Paolo Santi

    Abstract: Recent advances in large language models (LLMs), particularly those enhanced through reinforced post-training, have demonstrated impressive reasoning capabilities, as exemplified by models such as OpenAI o1 and DeepSeek-R1. However, these capabilities are predominantly benchmarked on domains like mathematical problem solving and code generation -- leaving open the question of whether such reasonin… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  15. arXiv:2506.12411  [pdf, ps, other

    cs.CR cs.CV

    InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning

    Authors: Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge

    Abstract: Multimodal contrastive learning models like CLIP have demonstrated remarkable vision-language alignment capabilities, yet their vulnerability to backdoor attacks poses critical security risks. Attackers can implant latent triggers that persist through downstream tasks, enabling malicious control of model behavior upon trigger presentation. Despite great success in recent defense mechanisms, they r… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  16. arXiv:2506.09542  [pdf, ps, other

    cs.CL

    KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs

    Authors: Dingjun Wu, Yukun Yan, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) improves factual accuracy by grounding responses in external knowledge. However, existing methods typically rely on a single source, either unstructured text or structured knowledge. Moreover, they lack cognitively inspired mechanisms for activating relevant knowledge. To address these issues, we propose KG-Infused RAG, a framework that integrates KGs into RAG… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  17. arXiv:2506.07996  [pdf, ps, other

    cs.CV cs.RO

    UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References

    Authors: Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo

    Abstract: 6D object pose estimation has shown strong generalizability to novel objects. However, existing methods often require either a complete, well-reconstructed 3D model or numerous reference images that fully cover the object. Estimating 6D poses from partial references, which capture only fragments of an object's appearance and geometry, remains challenging. To address this, we propose UA-Pose, an un… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: CVPR 2025

  18. arXiv:2506.07955  [pdf, ps, other

    cs.HC

    Implementation Considerations for Automated AI Grading of Student Work

    Authors: Zewei, Tian, Alex Liu, Lief Esbenshade, Shawon Sarkar, Zachary Zhang, Kevin He, Min Sun

    Abstract: This study explores the classroom implementation of an AI-powered grading platform in K-12 settings through a co-design pilot with 19 teachers. We combine platform usage logs, surveys, and qualitative interviews to examine how teachers use AI-generated rubrics and grading feedback. Findings reveal that while teachers valued the AI's rapid narrative feedback for formative purposes, they distrusted… ▽ More

    Submitted 17 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  19. arXiv:2506.07900  [pdf, ps, other

    cs.CL cs.AI

    MiniCPM4: Ultra-Efficient LLMs on End Devices

    Authors: MiniCPM Team, Chaojun Xiao, Yuxuan Li, Xu Han, Yuzhuo Bai, Jie Cai, Haotian Chen, Wentong Chen, Xin Cong, Ganqu Cui, Ning Ding, Shengdan Fan, Yewei Fang, Zixuan Fu, Wenyu Guan, Yitong Guan, Junshao Guo, Yufeng Han, Bingxiang He, Yuxiang Huang, Cunliang Kong, Qiuzuo Li, Siyuan Li, Wenhao Li, Yanghao Li , et al. (50 additional authors not shown)

    Abstract: This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelera… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: MiniCPM4 Technical Report

  20. arXiv:2506.07657  [pdf, ps, other

    cs.GR cs.CV

    PIG: Physically-based Multi-Material Interaction with 3D Gaussians

    Authors: Zeyu Xiao, Zhenyi Wu, Mingyang Sun, Qipeng Yan, Yufan Guo, Zhuoer Liang, Lihua Zhang

    Abstract: 3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Inter… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  21. arXiv:2506.04909  [pdf, ps, other

    cs.AI cs.CL cs.CR cs.LG

    When Thinking LLMs Lie: Unveiling the Strategic Deception in Representations of Reasoning Models

    Authors: Kai Wang, Yihao Zhang, Meng Sun

    Abstract: The honesty of large language models (LLMs) is a critical alignment challenge, especially as advanced systems with chain-of-thought (CoT) reasoning may strategically deceive humans. Unlike traditional honesty issues on LLMs, which could be possibly explained as some kind of hallucination, those models' explicit thought paths enable us to study strategic deception--goal-driven, intentional misinfor… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  22. arXiv:2506.02522  [pdf, ps, other

    cs.AI

    Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making

    Authors: Xu Wan, Wenyue Xu, Chao Yang, Mingyang Sun

    Abstract: Recent advancements in Large Language Models (LLMs) and Reinforcement Learning (RL) have shown significant promise in decision-making tasks. Nevertheless, for large-scale industrial decision problems, both approaches face distinct challenges: LLMs lack real-time long-sequence decision-making capabilities, while RL struggles with sample efficiency in vast action spaces. To bridge this gap, we propo… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  23. arXiv:2506.02503  [pdf, ps, other

    cs.CL

    KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG

    Authors: Yongjian Li, HaoCheng Chu, Yukun Yan, Zhenghao Liu, Shi Yu, Zheni Zeng, Ruobing Wang, Sen Song, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to access broader knowledge sources, yet factual inconsistencies persist due to noise in retrieved documents-even with advanced retrieval methods. We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance. In this paper, we present KARE-RAG (Knowledge-Aware… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  24. arXiv:2506.01947  [pdf, ps, other

    eess.IV cs.CV

    RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report

    Authors: Marcos V. Conde, Radu Timofte, Radu Berdan, Beril Besbinar, Daisuke Iso, Pengzhou Ji, Xiong Dun, Zeying Fan, Chen Wu, Zhansheng Wang, Pengbo Zhang, Jiazi Huang, Qinglin Liu, Wei Yu, Shengping Zhang, Xiangyang Ji, Kyungsik Kim, Minkyung Kim, Hwalmin Lee, Hekun Ma, Huan Zheng, Yanyan Wei, Zhao Zhang, Jing Fang, Meilin Gao , et al. (8 additional authors not shown)

    Abstract: Numerous low-level vision tasks operate in the RAW domain due to its linear properties, bit depth, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public sRGB datasets. For this reason, many approaches try to generate realistic RAW images using sensor information and sRGB images. This paper covers the second challenge on RAW… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  25. arXiv:2506.01770  [pdf, ps, other

    cs.CR cs.AI cs.LG cs.SE

    ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs

    Authors: Zeming Wei, Chengcan Wu, Meng Sun

    Abstract: Large Language Models (LLMs) have achieved significant success in various tasks, yet concerns about their safety and security have emerged. In particular, they pose risks in generating harmful content and vulnerability to jailbreaking attacks. To analyze and monitor machine learning models, model-based analysis has demonstrated notable potential in stateful deep neural networks, yet suffers from s… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  26. arXiv:2506.01391  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

    Authors: Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, Jianming Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan Yao, Zhiyuan Liu, Maosong Sun

    Abstract: The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability. However, practical deployment of such agents remains constrained by several key challenges. Existing training data is often noisy and lack semantic diversity, whi… ▽ More

    Submitted 16 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Updated results in Table 2 and Table 3; The project is available at https://github.com/OpenBMB/AgentCPM-GUI

    ACM Class: I.2.8; I.2.7; I.2.10; H.5.2

  27. arXiv:2505.24550  [pdf, ps, other

    cs.CL

    A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings

    Authors: Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He

    Abstract: Large Reasoning Models (LRMs) achieve superior performance by extending the thought length. However, a lengthy thinking trajectory leads to reduced efficiency. Most of the existing methods are stuck in the assumption of overthinking and attempt to reason efficiently by compressing the Chain-of-Thought, but this often leads to performance degradation. To address this problem, we introduce A*-Though… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  28. arXiv:2505.24388  [pdf, other

    cs.CL

    ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

    Authors: Hao Chen, Yukun Yan, Sen Mei, Wanxiang Che, Zhenghao Liu, Qi Shi, Xinze Li, Yuchun Fan, Pengcheng Huang, Qiushi Xiong, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To add… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  29. arXiv:2505.23187  [pdf, ps, other

    cs.CL cs.AI cs.MA

    Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration

    Authors: Yilong Li, Chen Qian, Yu Xia, Ruijie Shi, Yufan Dang, Zihao Xie, Ziming You, Weize Chen, Cheng Yang, Weichuan Liu, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Model-based multi-agent systems (MAS) have shown remarkable progress in solving complex tasks through collaborative reasoning and inter-agent critique. However, existing approaches typically treat each task in isolation, resulting in redundant computations and limited generalization across structurally similar tasks. To address this, we introduce multi-agent cross-task experiential… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  30. arXiv:2505.23079  [pdf, ps, other

    cs.HC

    iTrace : Interactive Tracing of Cross-View Data Relationships

    Authors: Abdul Rahman Shaikh, Maoyuan Sun, Xingchen Liu, Hamed Alhoori, Jian Zhao, David Koop

    Abstract: Exploring data relations across multiple views has been a common task in many domains such as bioinformatics, cybersecurity, and healthcare. To support this, various techniques (e.g., visual links and brushing and linking) are used to show related visual elements across views via lines and highlights. However, understanding the relations using these techniques, when many related elements are scatt… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 13 pages, 14 figures, accepted to Graphics Interface 2025

    MSC Class: 68U05 ACM Class: H.5.2; I.3.6; I.3.8

  31. arXiv:2505.22949  [pdf, ps, other

    cs.LG

    Directed Graph Grammars for Sequence-based Learning

    Authors: Michael Sun, Orion Foo, Gang Liu, Wojciech Matusik, Jie Chen

    Abstract: Directed acyclic graphs (DAGs) are a class of graphs commonly used in practice, with examples that include electronic circuits, Bayesian networks, and neural architectures. While many effective encoders exist for DAGs, it remains challenging to decode them in a principled manner, because the nodes of a DAG can have many different topological orders. In this work, we propose a grammar-based approac… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  32. arXiv:2505.22948  [pdf, ps, other

    cs.AI

    Foundation Molecular Grammar: Multi-Modal Foundation Models Induce Interpretable Molecular Graph Languages

    Authors: Michael Sun, Weize Yuan, Gang Liu, Wojciech Matusik, Jie Chen

    Abstract: Recent data-efficient molecular generation approaches exploit graph grammars to introduce interpretability into the generative models. However, grammar learning therein relies on expert annotation or unreliable heuristics for algorithmic inference. We propose Foundation Molecular Grammar (FMG), which leverages multi-modal foundation models (MMFMs) to induce an interpretable molecular language. By… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  33. arXiv:2505.22787  [pdf, ps, other

    cs.CL

    Can Large Language Models Match the Conclusions of Systematic Reviews?

    Authors: Christopher Polzak, Alejandro Lozano, Min Woo Sun, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy

    Abstract: Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  34. arXiv:2505.22445  [pdf, other

    cs.CV cs.AI

    NFR: Neural Feature-Guided Non-Rigid Shape Registration

    Authors: Puhua Jiang, Zhangquan Chen, Mingze Sun, Ruqi Huang

    Abstract: In this paper, we propose a novel learning-based framework for 3D shape registration, which overcomes the challenges of significant non-rigid deformation and partiality undergoing among input shapes, and, remarkably, requires no correspondence annotation during training. Our key insight is to incorporate neural features learned by deep learning-based shape matching networks into an iterative, geom… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 20 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:2311.04494

    ACM Class: I.4.m; I.2.6

  35. arXiv:2505.22131  [pdf, other

    cs.CL

    EULER: Enhancing the Reasoning Ability of Large Language Models through Error-Induced Learning

    Authors: Zhuoyang Wu, Xinze Li, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Minghe Yu, Cheng Yang, Yu Gu, Ge Yu, Maosong Sun

    Abstract: Large Language Models (LLMs) have demonstrated strong reasoning capabilities and achieved promising results in mathematical problem-solving tasks. Learning from errors offers the potential to further enhance the performance of LLMs during Supervised Fine-Tuning (SFT). However, the errors in synthesized solutions are typically gathered from sampling trails, making it challenging to generate solutio… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  36. arXiv:2505.22095  [pdf, ps, other

    cs.CL

    Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

    Authors: Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu, Minghe Yu, Ge Yu, Maosong Sun

    Abstract: Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook th… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  37. arXiv:2505.21898  [pdf, ps, other

    cs.CL cs.AI cs.MA cs.SE

    Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development

    Authors: Rennai Qiu, Chen Qian, Ran Li, Yufan Dang, Weize Chen, Cheng Yang, Yingli Zhang, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun

    Abstract: Recent advancements in Large Language Models (LLMs) and autonomous agents have demonstrated remarkable capabilities across various domains. However, standalone agents frequently encounter limitations when handling complex tasks that demand extensive interactions and substantial computational resources. Although Multi-Agent Systems (MAS) alleviate some of these limitations through collaborative mec… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  38. arXiv:2505.20687  [pdf, other

    cs.CV

    VisAlgae 2023: A Dataset and Challenge for Algae Detection in Microscopy Images

    Authors: Mingxuan Sun, Juntao Jiang, Zhiqiang Yang, Shenao Kong, Jiamin Qi, Jianru Shang, Shuangling Luo, Wanfa Sun, Tianyi Wang, Yanqi Wang, Qixuan Wang, Tingjian Dai, Tianxiang Chen, Jinming Zhang, Xuerui Zhang, Yuepeng He, Pengcheng Fu, Qiu Guan, Shizheng Zhou, Yanbo Yu, Qigui Jiang, Teng Zhou, Liuyong Shi, Hong Yan

    Abstract: Microalgae, vital for ecological balance and economic sectors, present challenges in detection due to their diverse sizes and conditions. This paper summarizes the second "Vision Meets Algae" (VisAlgae 2023) Challenge, aiming to enhance high-throughput microalgae cell detection. The challenge, which attracted 369 participating teams, includes a dataset of 1000 images across six classes, featuring… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  39. arXiv:2505.20662  [pdf, ps, other

    cs.AI

    AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

    Authors: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Weilun Zhao, Shuo Wang, Duzhen Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: Efficient experiment reproduction is critical to accelerating progress in artificial intelligence. However, the inherent complexity of method design and training procedures presents substantial challenges for automation. Notably, reproducing experiments often requires implicit domain-specific knowledge not explicitly documented in the original papers. To address this, we introduce the paper lineag… ▽ More

    Submitted 29 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 20 pages, preprint version

  40. arXiv:2505.20613  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.LO

    REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

    Authors: Ziju Shen, Naohao Huang, Fanyi Yang, Yutong Wang, Guoxiong Gao, Tianyi Xu, Jiedong Jiang, Wanyi He, Pu Yang, Mengzhou Sun, Haocheng Ju, Peihao Wu, Bryan Dai, Bin Dong

    Abstract: Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (… ▽ More

    Submitted 16 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  41. arXiv:2505.20195  [pdf, other

    cs.CL

    Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning

    Authors: Xiaorong Wang, Ting Yang, Zhu Zhang, Shuo Wang, Zihan Zhou, Liner Yang, Zhiyuan Liu, Maosong Sun

    Abstract: Assessing the quality of long-form, model-generated text is challenging, even with advanced LLM-as-a-Judge methods, due to performance degradation as input length increases. To address this issue, we propose a divide-and-conquer approach, which breaks down the comprehensive evaluation task into a series of localized scoring tasks, followed by a final global assessment. This strategy allows for mor… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  42. arXiv:2505.19591  [pdf, ps, other

    cs.CL cs.AI cs.MA

    Multi-Agent Collaboration via Evolving Orchestration

    Authors: Yufan Dang, Chen Qian, Xueheng Luo, Jingru Fan, Zihao Xie, Ruijie Shi, Weize Chen, Cheng Yang, Xiaoyin Che, Ye Tian, Xuantang Xiong, Lei Han, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordin… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  43. arXiv:2505.19217  [pdf, other

    cs.CL

    The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training

    Authors: Weize Chen, Jiarui Yuan, Tailin Jin, Ning Ding, Huimin Chen, Zhiyuan Liu, Maosong Sun

    Abstract: Recent large language models (LLMs) exhibit impressive reasoning but often over-think, generating excessively long responses that hinder efficiency. We introduce DIET ( DIfficulty-AwarE Training), a framework that systematically cuts these "token calories" by integrating on-the-fly problem difficulty into the reinforcement learning (RL) process. DIET dynamically adapts token compression strategies… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: under review

  44. arXiv:2505.18078  [pdf, ps, other

    cs.CV

    DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation

    Authors: Junhao Chen, Mingjin Chen, Jianjin Xu, Xiang Li, Junting Dong, Mingze Sun, Puhua Jiang, Hongxiang Li, Yuhang Yang, Hao Zhao, Xiaoxiao Long, Ruqi Huang

    Abstract: Controllable video generation (CVG) has advanced rapidly, yet current systems falter when more than one actor must move, interact, and exchange positions under noisy control signals. We address this gap with DanceTogether, the first end-to-end diffusion framework that turns a single reference image plus independent pose-mask streams into long, photorealistic videos while strictly preserving every… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Our video demos and code are available at https://DanceTog.github.io/

  45. arXiv:2505.17389  [pdf, other

    cs.RO cs.AI

    Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space

    Authors: Jinrong Yang, Kexun Chen, Zhuoling Li, Shengkai Wu, Yong Zhao, Liangliang Ren, Wenqiu Luo, Chaohui Shang, Meiyu Zhi, Linfeng Gao, Mingshan Sun, Hui Cheng

    Abstract: Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks. While minimal demonstrations enable robotic action execution, achieving high success rates and generalization requires high cost, e.g., continuously adding data or incrementally conducting human-in-loop processes with complex hardware/software systems. In this paper, we rethink the state/action… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  46. arXiv:2505.17329  [pdf, ps, other

    q-bio.NC cs.LG

    Transformer brain encoders explain human high-level visual responses

    Authors: Hossein Adeli, Minni Sun, Nikolaus Kriegeskorte

    Abstract: A major goal of neuroscience is to understand brain computations during visual processing in naturalistic settings. A dominant approach is to use image-computable deep neural networks trained with different task objectives as a basis for linear encoding models. However, in addition to requiring tuning a large number of parameters, the linear encoding approach ignores the structure of the feature m… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  47. arXiv:2505.16737  [pdf, other

    cs.LG cs.AI cs.CL cs.CR math.OC

    Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization

    Authors: Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Meng Sun

    Abstract: The significant progress of large language models (LLMs) has led to remarkable achievements across numerous applications. However, their ability to generate harmful content has sparked substantial safety concerns. Despite the implementation of safety alignment techniques during the pre-training phase, recent research indicates that fine-tuning LLMs on adversarial or even benign data can inadverten… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  48. arXiv:2505.16483  [pdf, other

    cs.CL cs.AI

    Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

    Authors: Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Bofei Gao, Kangyang Luo, Wenhao Li, Yufei Huang, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun

    Abstract: Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to improve the faithfulness of LLMs in both short-form and long-form generation tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tas… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  49. arXiv:2505.16459  [pdf, ps, other

    cs.AI

    MMLU-Reason: Benchmarking Multi-Task Multi-modal Language Understanding and Reasoning

    Authors: Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun

    Abstract: Recent advances in Multi-Modal Large Language Models (MLLMs) have enabled unified processing of language, vision, and structured inputs, opening the door to complex tasks such as logical deduction, spatial reasoning, and scientific analysis. Despite their promise, the reasoning capabilities of MLLMs, particularly those augmented with intermediate thinking traces (MLLMs-T), remain poorly understood… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 39 pages, 28 figures, 4 tables

  50. arXiv:2505.15094  [pdf, ps, other

    cs.CL

    SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models

    Authors: Jing Yu, Yuqi Tang, Kehua Feng, Mingyang Rao, Lei Liang, Zhiqiang Zhang, Mengshu Sun, Wen Zhang, Qiang Zhang, Keyan Ding, Huajun Chen

    Abstract: Large Language Models (LLMs) have shown impressive capabilities in contextual understanding and reasoning. However, evaluating their performance across diverse scientific domains remains underexplored, as existing benchmarks primarily focus on general domains and fail to capture the intricate complexity of scientific data. To bridge this gap, we construct SciCUEval, a comprehensive benchmark datas… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 25 pages, 4 figures