Skip to main content

Showing 1–50 of 5,445 results for author: liu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18322  [pdf, ps, other

    cs.CV cs.LG

    Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?

    Authors: Yiwei Yang, Chung Peng Lee, Shangbin Feng, Dora Zhao, Bingbing Wen, Anthony Z. Liu, Yulia Tsvetkov, Bill Howe

    Abstract: Finetuning can cause spurious correlations to arise between non-essential features and the target labels, but benchmarks to study these effects involve contrived settings and narrow tasks. In contrast, we consider spurious correlations in multi-modal Large Vision Language Models (LVLMs) pretrained on extensive and diverse datasets without explicit task supervision. We develop a benchmark by sourci… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2506.18254  [pdf, ps, other

    cs.LG cs.AI cs.CL

    RLPR: Extrapolating RLVR to General Domains without Verifiers

    Authors: Tianyu Yu, Bo Ji, Shouli Wang, Shu Yao, Zefan Wang, Ganqu Cui, Lifan Yuan, Ning Ding, Yuan Yao, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) demonstrates promising potential in advancing the reasoning capabilities of LLMs. However, its success remains largely confined to mathematical and code domains. This primary limitation stems from the heavy reliance on domain-specific verifiers, which results in prohibitive complexity and limited scalability. To address the challenge, our key o… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Website: https://github.com/openbmb/RLPR

  3. arXiv:2506.17787  [pdf, ps, other

    cs.CV

    Incorporating Rather Than Eliminating: Achieving Fairness for Skin Disease Diagnosis Through Group-Specific Expert

    Authors: Gelei Xu, Yuying Duan, Zheyuan Liu, Xueyang Li, Meng Jiang, Michael Lemmon, Wei Jin, Yiyu Shi

    Abstract: AI-based systems have achieved high accuracy in skin disease diagnostics but often exhibit biases across demographic groups, leading to inequitable healthcare outcomes and diminished patient trust. Most existing bias mitigation methods attempt to eliminate the correlation between sensitive attributes and diagnostic prediction, but those methods often degrade performance due to the lost of clinical… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 11 pages, 2 figures

  4. arXiv:2506.17740  [pdf, ps, other

    eess.SP cs.LG

    Rethinking the Role of Operating Conditions for Learning-based Multi-condition Fault Diagnosis

    Authors: Pengyu Han, Zeyi Liu, Shijin Chen, Dongliang Zou, Xiao He

    Abstract: Multi-condition fault diagnosis is prevalent in industrial systems and presents substantial challenges for conventional diagnostic approaches. The discrepancy in data distributions across different operating conditions degrades model performance when a model trained under one condition is applied to others. With the recent advancements in deep learning, transfer learning has been introduced to the… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 6 pages, 6 figures, conference

  5. arXiv:2506.17640  [pdf, ps, other

    cs.SI

    Empowering Iterative Graph Alignment Using Heat Diffusion

    Authors: Boyan Wang, Weijie Feng, Jinyang Huang, Dan Guo, Zhi Liu

    Abstract: Unsupervised plain graph alignment (UPGA) aims to align corresponding nodes across two graphs without any auxiliary information. Existing UPGA methods rely on structural consistency while neglecting the inherent structural differences in real-world graphs, leading to biased node representations. Moreover, their one-shot alignment strategies lack mechanisms to correct erroneous matches arising from… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  6. arXiv:2506.17561  [pdf, ps, other

    cs.CV cs.AI cs.RO

    VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

    Authors: Chongkai Gao, Zixuan Liu, Zhenghao Chi, Junshan Huang, Xin Fei, Yiwen Hou, Yuxuan Zhang, Yudi Lin, Zhirui Fang, Zeyu Jiang, Lin Shao

    Abstract: Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various complex, long-horizon manipulation tasks. However, existing approaches vary significantly in terms of network architectures, planning paradigms, representations, and t… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  7. arXiv:2506.17559  [pdf, ps, other

    cs.IT

    Joint Transmission for Cellular Networks with Pinching Antennas: System Design and Analysis

    Authors: Enzhi Zhou, Jingjing Cui, Ziyue Liu, Zhiguo Ding, Pingzhi Fan

    Abstract: As an emerging flexible antenna technology for wireless communications, pinching-antenna systems, offer distinct advantages in terms of cost efficiency and deployment flexibility. This paper investigates joint transmission strategies of the base station (BS) and pinching antennas (PAS), focusing specifically on how to cooperate efficiently between the BS and waveguide-mounted pinching antennas for… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  8. Pyramid Mixer: Multi-dimensional Multi-period Interest Modeling for Sequential Recommendation

    Authors: Zhen Gong, Zhifang Fan, Hui Lu, Qiwei Chen, Chenbin Zhang, Lin Guan, Yuchao Zheng, Feng Zhang, Xiao Yang, Zuotao Liu

    Abstract: Sequential recommendation, a critical task in recommendation systems, predicts the next user action based on the understanding of the user's historical behaviors. Conventional studies mainly focus on cross-behavior modeling with self-attention based methods while neglecting comprehensive user interest modeling for more dimensions. In this study, we propose a novel sequential recommendation model,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGIR'25

  9. arXiv:2506.16685  [pdf, ps, other

    cs.RO cs.LG

    Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections

    Authors: Xiaomeng Xu, Yifan Hou, Zeyi Liu, Shuran Song

    Abstract: We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gen… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  10. arXiv:2506.16500  [pdf, ps, other

    cs.LG

    SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

    Authors: Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu

    Abstract: Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: ICML 2025. The first three authors contributed equally to this work. Project page: https://z-lab.ai/projects/sparselora

  11. arXiv:2506.16499  [pdf, ps, other

    cs.AI cs.LG

    ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning

    Authors: Zexi Liu, Yuzhu Cai, Xinyu Zhu, Yujie Zheng, Runkun Chen, Ying Wen, Yanfeng Wang, Weinan E, Siheng Chen

    Abstract: As AI capabilities advance toward and potentially beyond human-level performance, a natural transition emerges where AI-driven development becomes more efficient than human-centric approaches. A promising pathway toward this transition lies in AI-for-AI (AI4AI), which leverages AI techniques to automate and optimize the design, training, and deployment of AI systems themselves. While LLM-based age… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  12. arXiv:2506.16495  [pdf, ps, other

    cs.MM cs.CV

    DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation

    Authors: Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin

    Abstract: Like image coding in visual data transmission, feature coding is essential for the distributed deployment of large models by significantly reducing transmission and storage overhead. However, prior studies have mostly targeted task- or model-specific scenarios, leaving the challenge of universal feature coding across diverse large models largely unaddressed. In this paper, we present the first sys… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  13. arXiv:2506.16447  [pdf, ps, other

    cs.CR cs.CL

    Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models

    Authors: Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li

    Abstract: Backdoor unalignment attacks against Large Language Models (LLMs) enable the stealthy compromise of safety alignment using a hidden trigger while evading normal safety auditing. These attacks pose significant threats to the applications of LLMs in the real-world Large Language Model as a Service (LLMaaS) setting, where the deployed model is a fully black-box system that can only interact through t… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted at ICLR 2025

    Journal ref: Proceedings of The Thirteenth International Conference on Learning Representations (ICLR 2025)

  14. arXiv:2506.16379  [pdf, ps, other

    cs.DB

    PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking

    Authors: Yan Zhou, Chunwei Liu, Bhuvan Urgaonkar, Zhengle Wang, Magnus Mueller, Chao Zhang, Songyue Zhang, Pascal Pfeil, Dominik Horn, Zhengchun Liu, Davide Pagano, Tim Kraska, Samuel Madden, Ju Fan

    Abstract: Cloud service providers commonly use standard benchmarks like TPC-H and TPC-DS to evaluate and optimize cloud data analytics systems. However, these benchmarks rely on fixed query patterns and fail to capture the real execution statistics of production cloud workloads. Although some cloud database vendors have recently released real workload traces, these traces alone do not qualify as benchmarks,… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  15. arXiv:2506.15684  [pdf, ps, other

    cs.GR cs.CV cs.LG

    Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

    Authors: Qingming Liu, Zhen Liu, Dinghuai Zhang, Kui Jia

    Abstract: Generating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, ge… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Technical Report (21 pages, 21 figures)

  16. arXiv:2506.15267  [pdf, ps, other

    cs.IR

    Next-User Retrieval: Enhancing Cold-Start Recommendations via Generative Next-User Modeling

    Authors: Yu-Ting Lan, Yang Huo, Yi Shen, Xiao Yang, Zuotao Liu

    Abstract: The item cold-start problem is critical for online recommendation systems, as the success of this phase determines whether high-quality new items can transition to popular ones, receive essential feedback to inspire creators, and thus lead to the long-term retention of creators. However, modern recommendation systems still struggle to address item cold-start challenges due to the heavy reliance on… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  17. arXiv:2506.15155  [pdf, ps, other

    cs.DC

    eLLM: Elastic Memory Management Framework for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and key-value caches. While static weights are constant and predictable, dynamic components such as activations and KV caches change frequently during runtime, presenting significant challenges for efficie… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  18. arXiv:2506.14968  [pdf, ps, other

    cs.RO cs.AI

    FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization

    Authors: Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, Tapomayukh Bhattacharjee

    Abstract: Physical caregiving robots hold promise for improving the quality of life of millions worldwide who require assistance with feeding. However, in-home meal assistance remains challenging due to the diversity of activities (e.g., eating, drinking, mouth wiping), contexts (e.g., socializing, watching TV), food items, and user preferences that arise during deployment. In this work, we propose FEAST, a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: RSS 2025 - Outstanding Paper Award & Outstanding Systems Paper Award Finalist

  19. arXiv:2506.14965  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Authors: Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 38 pages, 9 figures. Under review

  20. arXiv:2506.14865  [pdf, ps, other

    cs.RO

    Efficient and Real-Time Motion Planning for Robotics Using Projection-Based Optimization

    Authors: Xuemin Chi, Hakan Girgin, Tobias Löw, Yangyang Xie, Teng Xue, Jihao Huang, Cheng Hu, Zhitao Liu, Sylvain Calinon

    Abstract: Generating motions for robots interacting with objects of various shapes is a complex challenge, further complicated by the robot geometry and multiple desired behaviors. While current robot programming tools (such as inverse kinematics, collision avoidance, and manipulation planning) often treat these problems as constrained optimization, many existing solvers focus on specific problem domains or… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: submitted to IROS 2025

  21. arXiv:2506.14754  [pdf, ps, other

    cs.RO

    Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

    Authors: Carolina Higuera, Akash Sharma, Taosha Fan, Chaithanya Krishna Bodduluri, Byron Boots, Michael Kaess, Mike Lambeta, Tingfan Wu, Zixi Liu, Francois Robert Hogan, Mustafa Mukadam

    Abstract: We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  22. arXiv:2506.14429  [pdf, ps, other

    cs.CL

    LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

    Authors: Xiaoran Liu, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu

    Abstract: Large Language Diffusion Models, or diffusion LLMs, have emerged as a significant focus in NLP research, with substantial effort directed toward understanding their scalability and downstream task performance. However, their long-context capabilities remain unexplored, lacking systematic analysis or methods for context extension. In this work, we present the first systematic investigation comparin… ▽ More

    Submitted 22 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: 16 pages, 12 figures, work in progress

  23. arXiv:2506.14245  [pdf, ps, other

    cs.AI cs.CL

    Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

    Authors: Xumeng Wen, Zihan Liu, Shun Zheng, Zhijian Xu, Shengyu Ye, Zhirong Wu, Xiao Liang, Yang Wang, Junjie Li, Ziming Miao, Jiang Bian, Mao Yang

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for advancing the reasoning capabilities of Large Language Models (LLMs). However, a critical paradox clouds its efficacy: RLVR-tuned models often underperform their base models on the $Pass@K$ metric for solution-finding, leading to the hypothesis that RLVR merely re-weights existing reasoning paths at the c… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Preprint

  24. arXiv:2506.14181  [pdf, ps, other

    cs.CV

    Meta-SurDiff: Classification Diffusion Model Optimized by Meta Learning is Reliable for Online Surgical Phase Recognition

    Authors: Yufei Li, Jirui Wu, Long Tian, Liming Wang, Xiaonan Liu, Zijun Liu, Xiyang Liu

    Abstract: Online surgical phase recognition has drawn great attention most recently due to its potential downstream applications closely related to human life and health. Despite deep models have made significant advances in capturing the discriminative long-term dependency of surgical videos to achieve improved recognition, they rarely account for exploring and modeling the uncertainty in surgical videos,… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 15 pages, 5 figures

  25. arXiv:2506.14028  [pdf, ps, other

    cs.CL

    MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

    Authors: Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu, Peng Lu, Jerry Huang, Suyuchen Wang , et al. (19 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have accelerated progress in financial NLP and applications, yet existing benchmarks remain limited to monolingual and unimodal settings, often over-relying on simple tasks and failing to reflect the complexity of real-world financial communication. We introduce MultiFinBen, the first multilingual and multimodal benchmark tailored to the global finan… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  26. arXiv:2506.13992  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ME

    AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science

    Authors: An Luo, Xun Xian, Jin Du, Fangqiao Tian, Ganghua Wang, Ming Zhong, Shengchun Zhao, Xuan Bi, Zirui Liu, Jiawei Zhou, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding

    Abstract: Large language models (LLMs) have advanced the automation of data science workflows. Yet it remains unclear whether they can critically leverage external domain knowledge as human data scientists do in practice. To answer this question, we introduce AssistedDS (Assisted Data Science), a benchmark designed to systematically evaluate how LLMs handle domain knowledge in tabular prediction tasks. Assi… ▽ More

    Submitted 25 May, 2025; originally announced June 2025.

    MSC Class: 62-07; 62-08; 68T05; 68T07; 68T01; 68T50 ACM Class: I.2.0; I.2.6; I.2.7; I.5.1; I.5.4; H.2.8; G.3

  27. arXiv:2506.13709  [pdf, ps, other

    eess.AS cs.SD

    SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

    Authors: Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li

    Abstract: Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To ad… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  28. arXiv:2506.13695  [pdf, ps, other

    cs.IR

    OneRec Technical Report

    Authors: Guorui Zhou, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Qiang Luo, Qianqian Wang, Qigen Hu, Rui Huang, Shiyao Wang, Weifeng Ding, Wuchao Li, Xinchen Luo, Xingmei Wang, Zexuan Cheng, Zixing Zhang, Bin Zhang, Boxuan Wang, Chaoyi Ma, Chengru Song, Chenhui Wang, Di Wang, Dongxue Meng, Fan Yang, Fangyu Zhang , et al. (40 additional authors not shown)

    Abstract: Recommender systems have been widely used in various large-scale user-oriented platforms for many years. However, compared to the rapid developments in the AI community, recommendation systems have not achieved a breakthrough in recent years. For instance, they still rely on a multi-stage cascaded architecture rather than an end-to-end approach, leading to computational fragmentation and optimizat… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Authors are listed alphabetically by their first name

  29. arXiv:2506.13654  [pdf, ps, other

    cs.CV cs.AI

    Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

    Authors: Shulin Tian, Ruiqi Wang, Hongming Guo, Penghao Wu, Yuhao Dong, Xiuying Wang, Jingkang Yang, Hao Zhang, Hongyuan Zhu, Ziwei Liu

    Abstract: We introduce Ego-R1, a novel framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, which leverages a structured Chain-of-Tool-Thought (CoTT) process, orchestrated by an Ego-R1 Agent trained via reinforcement learning (RL). Inspired by human problem-solving strategies, CoTT decomposes complex reasoning into modular steps, with the RL agent invoking specific tools, one… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Project page: https://egolife-ai.github.io/Ego-R1/

  30. arXiv:2506.13651  [pdf, ps, other

    cs.LG

    xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

    Authors: Kaiyuan Chen, Yixin Ren, Yang Liu, Xiaobo Hu, Haotong Tian, Tianbao Xie, Fangfu Liu, Haoye Zhang, Hongzhang Liu, Yuan Gong, Chen Sun, Han Hou, Hui Yang, James Pan, Jianan Lou, Jiayi Mao, Jizheng Liu, Jinpeng Li, Kangyi Liu, Kenkun Liu, Rui Wang, Run Li, Tong Niu, Wenlong Zhang, Wenqi Yan , et al. (8 additional authors not shown)

    Abstract: We introduce xbench, a dynamic, profession-aligned evaluation suite designed to bridge the gap between AI agent capabilities and real-world productivity. While existing benchmarks often focus on isolated technical skills, they may not accurately reflect the economic value agents deliver in professional settings. To address this, xbench targets commercially significant domains with evaluation tasks… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Project page: https://xbench.org

  31. arXiv:2506.13363  [pdf, ps, other

    cs.CL

    Efficient Medical VIE via Reinforcement Learning

    Authors: Lijun Liu, Ruiyang Li, Zhaocheng Liu, Chenglin Zhu, Chong Li, Jiehan Cheng, Qiang Ju, Jian Xie

    Abstract: Visual Information Extraction (VIE) converts unstructured document images into structured formats like JSON, critical for medical applications such as report analysis and online consultations. Traditional methods rely on OCR and language models, while end-to-end multimodal models offer direct JSON generation. However, domain-specific schemas and high annotation costs limit their effectiveness in m… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  32. arXiv:2506.13284  [pdf, ps, other

    cs.CL cs.AI cs.LG

    AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

    Authors: Zihan Liu, Zhuolin Yang, Yang Chen, Chankyu Lee, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

    Abstract: In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling t… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: The AceReason-Nemotron collection: https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485

  33. arXiv:2506.13205  [pdf, ps, other

    cs.CR cs.AI

    Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

    Authors: Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang

    Abstract: With the growing integration of vision-language models (VLMs), mobile agents are now widely used for tasks like UI automation and camera-based user assistance. These agents are often fine-tuned on limited user-generated datasets, leaving them vulnerable to covert threats during the training process. In this work we present GHOST, the first clean-label backdoor attack specifically designed for mobi… ▽ More

    Submitted 19 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

    Comments: 12 pages

  34. arXiv:2506.12849  [pdf, ps, other

    cs.CV

    CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making

    Authors: Songtao Jiang, Yuan Wang, Ruizhe Chen, Yan Zhang, Ruilin Luo, Bohan Lei, Sibo Song, Yang Feng, Jimeng Sun, Jian Wu, Zuozhu Liu

    Abstract: In medical visual question answering (Med-VQA), achieving accurate responses relies on three critical steps: precise perception of medical imaging data, logical reasoning grounded in visual input and textual questions, and coherent answer derivation from the reasoning process. Recent advances in general vision-language models (VLMs) show that large-scale reinforcement learning (RL) could significa… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  35. arXiv:2506.12577  [pdf, ps, other

    cs.CL

    OneEval: Benchmarking LLM Knowledge-intensive Reasoning over Diverse Knowledge Bases

    Authors: Yongrui Chen, Zhiqiang Liu, Jing Yu, Lin Ren, Nan Hu, Xinbang Dai, Jiajun Liu, Jiazhen Kang, Shenyu Zhang, Xinda Wang, Keyan Ding, Pengfei Shen, Haolei Zhu, Hongjie Deng, Yisong Wang, Tongtong Wu, Sheng Bi, Wen Zhang, Tianxing Wu, Qiu Ji, Haofen Wang, Wenliang Chen, Huajun Chen, Guilin Qi

    Abstract: Large Language Models (LLMs) have demonstrated substantial progress on reasoning tasks involving unstructured text, yet their capabilities significantly deteriorate when reasoning requires integrating structured external knowledge such as knowledge graphs, code snippets, or formal logic. This limitation is partly due to the absence of benchmarks capable of systematically evaluating LLM performance… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  36. arXiv:2506.12409  [pdf, ps, other

    cs.CV

    Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

    Authors: Ziwei Liu, Borui Kang, Wei Li, Hangjie Yuan, Yanbing Yang, Wenbin Li, Jun Luo, Yifan Zhu, Tao Feng

    Abstract: Continual learning in vision-language models (VLMs) faces critical challenges in balancing parameter efficiency, memory consumption, and optimization stability. While First-Order (FO) optimization (e.g., SGD) dominate current approaches, their deterministic gradients often trap models in suboptimal local minima and incur substantial memory overhead. This paper pioneers a systematic exploration of… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  37. arXiv:2506.12307  [pdf, ps, other

    cs.CL cs.AI

    Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

    Authors: Xiaotian Zhang, Yuan Wang, Zhaopeng Feng, Ruizhe Chen, Zhijie Zhou, Yan Zhang, Hongxia Xu, Jian Wu, Zuozhu Liu

    Abstract: Medical Question-Answering (QA) encompasses a broad spectrum of tasks, including multiple choice questions (MCQ), open-ended text generation, and complex computational reasoning. Despite this variety, a unified framework for delivering high-quality medical QA has yet to emerge. Although recent progress in reasoning-augmented large language models (LLMs) has shown promise, their ability to achieve… ▽ More

    Submitted 19 June, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

  38. arXiv:2506.12078  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.CY cs.SI

    Modeling Earth-Scale Human-Like Societies with One Billion Agents

    Authors: Haoxiang Guan, Jiyan He, Liyang Fan, Zhenzhen Ren, Shaobin He, Xin Yu, Yuan Chen, Shuxin Zheng, Tie-Yan Liu, Zhen Liu

    Abstract: Understanding how complex societal behaviors emerge from individual cognition and interactions requires both high-fidelity modeling of human behavior and large-scale simulations. Traditional agent-based models (ABMs) have been employed to study these dynamics for decades, but are constrained by simplified agent behaviors that fail to capture human complexity. Recent advances in large language mode… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Work in progress

  39. arXiv:2506.12037  [pdf, ps, other

    cs.LG cs.AI

    How to Train a Model on a Cheap Cluster with Low Cost using Block Coordinate Descent

    Authors: Zeyu Liu, Yunquan Zhang, Boyang Zhang, Guoyong Jiang, Daning Cheng

    Abstract: Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we present a full-parameter pre-training framework based on block coordinate descent (BCD), augmented with engineering optimizations, to efficiently train large models on affordable RTX 4090 GPU clusters. BCD ensures… ▽ More

    Submitted 22 May, 2025; originally announced June 2025.

    Comments: under review

  40. arXiv:2506.11886  [pdf, ps, other

    cs.CL

    Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache

    Authors: Xiaoran Liu, Siyang He, Qiqi Wang, Ruixiao Li, Yuerong Song, Zhigeng Liu, Linlin Li, Qun Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu

    Abstract: Large Language Models struggle with memory demands from the growing Key-Value (KV) cache as context lengths increase. Existing compression methods homogenize head dimensions or rely on attention-guided token pruning, often sacrificing accuracy or introducing computational overhead. We propose FourierAttention, a training-free framework that exploits the heterogeneous roles of transformer head dime… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 10 pages, 7 figures, work in progress

  41. arXiv:2506.11612  [pdf, ps, other

    cs.CR cs.SE

    KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis

    Authors: Zhijie Liu, Qiyi Tang, Sen Nie, Shi Wu, Liang Feng Zhang, Yutian Tang

    Abstract: Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search)… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  42. arXiv:2506.11469  [pdf, ps, other

    cs.AI

    Structure-Aware Automatic Channel Pruning by Searching with Graph Embedding

    Authors: Zifan Liu, Yuan Cao, Yanwei Yu, Heng Qi, Jie Gui

    Abstract: Channel pruning is a powerful technique to reduce the computational overhead of deep neural networks, enabling efficient deployment on resource-constrained devices. However, existing pruning methods often rely on local heuristics or weight-based criteria that fail to capture global structural dependencies within the network, leading to suboptimal pruning decisions and degraded model performance. T… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: 12 pages, 2 figures

  43. arXiv:2506.11209  [pdf, ps, other

    cs.PL

    A Performance Model for Warp Specialization Kernels

    Authors: Zhengyang Liu, Vinod Grover

    Abstract: This paper presents a performance model tailored for warp specialization kernels, focusing on factors such as warp size, tilling size, input matrix size, memory bandwidth, and thread divergence. Our model offers accurate predictions of execution time by leveraging differential equations validated through simulations and experiments. The insights gained from this model not only enhance our understa… ▽ More

    Submitted 16 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  44. arXiv:2506.11147  [pdf, ps, other

    cs.CV

    3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks

    Authors: Xiaotang Gai, Jiaxiang Liu, Yichen Li, Zijie Meng, Jian Wu, Zuozhu Liu

    Abstract: Medical Visual Question Answering (Med-VQA) holds significant potential for clinical decision support, yet existing efforts primarily focus on 2D imaging with limited task diversity. This paper presents 3D-RAD, a large-scale dataset designed to advance 3D Med-VQA using radiology CT scans. The 3D-RAD dataset encompasses six diverse VQA tasks: anomaly detection, image observation, medical computatio… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  45. arXiv:2506.10848  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

    Authors: Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Dongrui Liu, Linfeng Zhang

    Abstract: Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. I… ▽ More

    Submitted 12 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 11 pages; 5 figures;

  46. arXiv:2506.10826  [pdf, ps, other

    cs.RO

    RationalVLA: A Rational Vision-Language-Action Model with Dual System

    Authors: Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Can Cui, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li

    Abstract: A fundamental requirement for real-world robotic deployment is the ability to understand and respond to natural language instructions. Existing language-conditioned manipulation tasks typically assume that instructions are perfectly aligned with the environment. This assumption limits robustness and generalization in realistic scenarios where instructions may be ambiguous, irrelevant, or infeasibl… ▽ More

    Submitted 13 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 14 pages

  47. arXiv:2506.10822  [pdf, ps, other

    cs.CL

    ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization

    Authors: Zhensheng Jin, Xinze Li, Yifan Ji, Chunyi Peng, Zhenghao Liu, Qi Shi, Yukun Yan, Shuo Wang, Furong Peng, Ge Yu

    Abstract: Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of Large Language Models (LLMs). However, these methods often suffer from overthinking, leading to unnecessarily lengthy or redundant reasoning traces. Existing approaches attempt to mitigate this issue through curating multiple reasoning chains for training LLMs, but their effectiveness is o… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  48. arXiv:2506.10821  [pdf, ps, other

    cs.CV cs.AI cs.CL

    VideoDeepResearch: Long Video Understanding With Agentic Tool Using

    Authors: Huaying Yuan, Zheng Liu, Junjie Zhou, Hongjin Qian, Ji-Rong Wen, Zhicheng Dou

    Abstract: Long video understanding (LVU) presents a significant challenge for current multi-modal large language models (MLLMs) due to the task's inherent complexity and context window constraint. It is widely assumed that addressing LVU tasks requires foundation MLLMs with extended context windows, strong visual perception capabilities, and proficient domain expertise. In this work, we challenge this commo… ▽ More

    Submitted 15 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  49. arXiv:2506.10762  [pdf, ps, other

    cs.HC

    Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction

    Authors: Bao Zhang, Zihan Li, Zhenglei Liu, Huanchen Wang, Yuxin Ma

    Abstract: Text animation, a foundational element in video creation, enables efficient and cost-effective communication, thriving in advertisements, journalism, and social media. However, traditional animation workflows present significant usability barriers for non-professionals, with intricate operational procedures severely hindering creative productivity. To address this, we propose a Large Language Mode… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  50. arXiv:2506.10365  [pdf

    cs.SE

    AutoGEEval++: A Multi-Level and Multi-Geospatial-Modality Automated Evaluation Framework for Large Language Models in Geospatial Code Generation on Google Earth Engine

    Authors: Shuyang Hou, Zhangxiao Shen, Huayi Wu, Haoyue Jiao, Ziqi Liu, Lutong Xie, Chang Liu, Jianyuan Liang, Yaxian Qing, Xiaopu Zhang, Dehua Peng, Zhipeng Gui, Xuefeng Guan

    Abstract: Geospatial code generation is becoming a key frontier in integrating artificial intelligence with geo-scientific analysis, yet standardised automated evaluation tools for this task remain absent. This study presents AutoGEEval++, an enhanced framework building on AutoGEEval, and the first automated assessment system for large language models (LLMs) generating geospatial code on Google Earth Engine… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.