Skip to main content

Showing 1–50 of 752 results for author: Gao, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00209  [pdf, ps, other

    eess.IV cs.AI cs.CV cs.RO

    SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

    Authors: Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri

    Abstract: High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging a… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  2. arXiv:2506.23674  [pdf, ps, other

    cs.CV

    Partial Forward Blocking: A Novel Data Pruning Paradigm for Lossless Training Acceleration

    Authors: Dongyue Wu, Zilin Guo, Jialong Zuo, Nong Sang, Changxin Gao

    Abstract: The ever-growing size of training datasets enhances the generalization capability of modern machine learning models but also incurs exorbitant computational costs. Existing data pruning approaches aim to accelerate training by removing those less important samples. However, they often rely on gradients or proxy models, leading to prohibitive additional costs of gradient back-propagation and proxy… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted by ICCV2025

  3. arXiv:2506.23135  [pdf, ps, other

    cs.CV cs.RO

    RoboScape: Physics-informed Embodied World Model

    Authors: Yu Shang, Xin Zhang, Yinzhou Tang, Lei Jin, Chen Gao, Wei Wu, Yong Li

    Abstract: World models have become indispensable tools for embodied intelligence, serving as powerful simulators capable of generating realistic robotic videos while addressing critical data scarcity challenges. However, current embodied world models exhibit limited physical awareness, particularly in modeling 3D geometry and motion dynamics, resulting in unrealistic video generation for contact-rich roboti… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 17 pages

  4. arXiv:2506.22554  [pdf, ps, other

    cs.CV cs.AI

    Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

    Authors: Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero, Morteza Behrooz, Julia Buffalini, Fabio Maria Carlucci, Joy Chen, Junming Chen, Zhang Chen, Shiyang Cheng, Praveen Chowdary, Joe Chuang, Antony D'Avirro, Jon Daly, Ning Dong, Mark Duppenthaler, Cynthia Gao, Jeff Girard, Martin Gleize, Sahir Gomez, Hongyu Gong, Srivathsan Govindarajan, Brandon Han, Sen He, Denise Hernandez , et al. (59 additional authors not shown)

    Abstract: Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  5. arXiv:2506.17712  [pdf, ps, other

    cs.CV

    PDC-Net: Pattern Divide-and-Conquer Network for Pelvic Radiation Injury Segmentation

    Authors: Xinyu Xiong, Wuteng Cao, Zihuang Wu, Lei Zhang, Chong Gao, Guanbin Li, Qiyuan Qin

    Abstract: Accurate segmentation of Pelvic Radiation Injury (PRI) from Magnetic Resonance Images (MRI) is crucial for more precise prognosis assessment and the development of personalized treatment plans. However, automated segmentation remains challenging due to factors such as complex organ morphologies and confusing context. To address these challenges, we propose a novel Pattern Divide-and-Conquer Networ… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025

  6. arXiv:2506.17561  [pdf, ps, other

    cs.CV cs.AI cs.RO

    VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

    Authors: Chongkai Gao, Zixuan Liu, Zhenghao Chi, Junshan Huang, Xin Fei, Yiwen Hou, Yuxuan Zhang, Yudi Lin, Zhirui Fang, Zeyu Jiang, Lin Shao

    Abstract: Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various complex, long-horizon manipulation tasks. However, existing approaches vary significantly in terms of network architectures, planning paradigms, representations, and t… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  7. arXiv:2506.16591  [pdf, ps, other

    cs.AR eess.SP

    SparseDPD: A Sparse Neural Network-based Digital Predistortion FPGA Accelerator for RF Power Amplifier Linearization

    Authors: Manno Versluis, Yizhuo Wu, Chang Gao

    Abstract: Digital predistortion (DPD) is crucial for linearizing radio frequency (RF) power amplifiers (PAs), improving signal integrity and efficiency in wireless systems. Neural network (NN)-based DPD methods surpass traditional polynomial models but face computational challenges limiting their practical deployment. This paper introduces SparseDPD, an FPGA accelerator employing a spatially sparse phase-no… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted to FPL 2025

  8. arXiv:2506.16495  [pdf, ps, other

    cs.MM cs.CV

    DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation

    Authors: Changsheng Gao, Zijie Liu, Li Li, Dong Liu, Xiaoyan Sun, Weisi Lin

    Abstract: Like image coding in visual data transmission, feature coding is essential for the distributed deployment of large models by significantly reducing transmission and storage overhead. However, prior studies have mostly targeted task- or model-specific scenarios, leaving the challenge of universal feature coding across diverse large models largely unaddressed. In this paper, we present the first sys… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  9. arXiv:2506.15421  [pdf, ps, other

    cs.LG cs.AI

    Reward Models in Deep Reinforcement Learning: A Survey

    Authors: Rui Yu, Shenghua Wan, Yucen Wang, Chen-Xiao Gao, Le Gan, Zongzhang Zhang, De-Chuan Zhan

    Abstract: In reinforcement learning (RL), agents continually interact with the environment and use the feedback to refine their behavior. To guide policy optimization, reward models are introduced as proxies of the desired objectives, such that when the agent maximizes the accumulated reward, it also fulfills the task designer's intentions. Recently, significant attention from both academic and industrial r… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: IJCAI 2025 Survey Track (To Appear)

  10. arXiv:2506.14965  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Authors: Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 38 pages, 9 figures. Under review

  11. arXiv:2506.12824  [pdf, ps, other

    cs.CV

    Learning Unpaired Image Dehazing with Physics-based Rehazy Generation

    Authors: Haoyou Deng, Zhiqiang Li, Feng Zhang, Qingbo Lu, Zisheng Cao, Yuanjie Shao, Shuhang Gu, Changxin Gao, Nong Sang

    Abstract: Overfitting to synthetic training pairs remains a critical challenge in image dehazing, leading to poor generalization capability to real-world scenarios. To address this issue, existing approaches utilize unpaired realistic data for training, employing CycleGAN or contrastive learning frameworks. Despite their progress, these methods often suffer from training instability, resulting in limited de… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  12. arXiv:2506.12737  [pdf, ps, other

    cs.CV cs.DC

    Cross-architecture universal feature coding via distribution alignment

    Authors: Changsheng Gao, Shan Liu, Feng Wu, Weisi Lin

    Abstract: Feature coding has become increasingly important in scenarios where semantic representations rather than raw pixels are transmitted and stored. However, most existing methods are architecture-specific, targeting either CNNs or Transformers. This design limits their applicability in real-world scenarios where features from both architectures coexist. To address this gap, we introduce a new research… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  13. Energy-Efficient Real-Time Job Mapping and Resource Management in Mobile-Edge Computing

    Authors: Chuanchao Gao, Niraj Kumar, Arvind Easwaran

    Abstract: Mobile-edge computing (MEC) has emerged as a promising paradigm for enabling Internet of Things (IoT) devices to handle computation-intensive jobs. Due to the imperfect parallelization of algorithms for job processing on servers and the impact of IoT device mobility on data communication quality in wireless networks, it is crucial to jointly consider server resource allocation and IoT device mobil… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Journal ref: 2024 IEEE Real-Time Systems Symposium (RTSS)

  14. arXiv:2506.12165  [pdf, ps, other

    eess.SP cs.AI

    TCN-DPD: Parameter-Efficient Temporal Convolutional Networks for Wideband Digital Predistortion

    Authors: Huanqiang Duan, Manno Versluis, Qinyu Chen, Leo C. N. de Vreede, Chang Gao

    Abstract: Digital predistortion (DPD) is essential for mitigating nonlinearity in RF power amplifiers, particularly for wideband applications. This paper presents TCN-DPD, a parameter-efficient architecture based on temporal convolutional networks, integrating noncausal dilated convolutions with optimized activation functions. Evaluated on the OpenDPD framework with the DPA_200MHz dataset, TCN-DPD achieves… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE MTT-S International Microwave Symposium (IMS) 2025

  15. arXiv:2506.11172  [pdf, ps, other

    cs.LG cs.AI

    Collapsing Sequence-Level Data-Policy Coverage via Poisoning Attack in Offline Reinforcement Learning

    Authors: Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang

    Abstract: Offline reinforcement learning (RL) heavily relies on the coverage of pre-collected data over the target policy's distribution. Existing studies aim to improve data-policy coverage to mitigate distributional shifts, but overlook security risks from insufficient coverage, and the single-step analysis is not consistent with the multi-step decision-making nature of offline RL. To address this, we int… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  16. arXiv:2506.10376  [pdf, ps, other

    cs.SE cs.HC

    MLLM-Based UI2Code Automation Guided by UI Layout Information

    Authors: Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao

    Abstract: Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with general… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted by the 34th International Symposium on Software Testing and Analysis (ISSTA 2025)

  17. arXiv:2506.10082  [pdf, ps, other

    cs.CV

    LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

    Authors: Chenjian Gao, Lihe Ding, Xin Cai, Zhanpeng Huang, Zibin Wang, Tianfan Xue

    Abstract: Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos. However, current methods often rely on large-scale pretraining, limiting flexibility for specific edits. First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) t… ▽ More

    Submitted 24 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: 12 pages

  18. arXiv:2506.09839  [pdf, ps, other

    cs.CV cs.AI cs.RO

    OctoNav: Towards Generalist Embodied Navigation

    Authors: Chen Gao, Liankai Jin, Xingyu Peng, Jiazhao Zhang, Yue Deng, Annan Li, He Wang, Si Liu

    Abstract: Embodied navigation stands as a foundation pillar within the broader pursuit of embodied AI. However, previous navigation research is divided into different tasks/capabilities, e.g., ObjNav, ImgNav and VLN, where they differ in task objectives and modalities, making datasets and methods are designed individually. In this work, we take steps toward generalist navigation agents, which can follow fre… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 31 pages, 25 figures

  19. arXiv:2506.09385  [pdf, ps, other

    cs.CV

    ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model

    Authors: Jialong Zuo, Yongtai Deng, Mengdan Tan, Rui Jin, Dongyue Wu, Nong Sang, Liang Pan, Changxin Gao

    Abstract: In real-word scenarios, person re-identification (ReID) expects to identify a person-of-interest via the descriptive query, regardless of whether the query is a single modality or a combination of multiple modalities. However, existing methods and datasets remain constrained to limited modalities, failing to meet this requirement. Therefore, we investigate a new challenging problem called Omni Mul… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  20. arXiv:2506.07412  [pdf, ps, other

    cs.CV

    Compressed Feature Quality Assessment: Dataset and Baselines

    Authors: Changsheng Gao, Wei Zhou, Guosheng Lin, Weisi Lin

    Abstract: The widespread deployment of large models in resource-constrained environments has underscored the need for efficient transmission of intermediate feature representations. In this context, feature coding, which compresses features into compact bitstreams, becomes a critical component for scenarios involving feature transmission, storage, and reuse. However, this compression process introduces inhe… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  21. arXiv:2506.07390  [pdf, ps, other

    cs.AI cs.SE

    Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data

    Authors: Xin-Cheng Wen, Yijun Yang, Cuiyun Gao, Yang Xiao, Deheng Ye

    Abstract: Large language models (LLMs) demonstrate considerable proficiency in numerous coding-related tasks; however, their capabilities in detecting software vulnerabilities remain limited. This limitation primarily stems from two factors: (1) the absence of reasoning data related to vulnerabilities, which hinders the models' ability to capture underlying vulnerability patterns; and (2) their focus on lea… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 Findings

  22. arXiv:2506.06677  [pdf, ps, other

    cs.RO cs.CV

    RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation

    Authors: Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, Si Liu

    Abstract: Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs' strengths in semantic reasoning and long-horizon planning. These System 2 capabilities-characterized by deliberative, goal-directed thinking-remain under explored due to the limite… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 23 pages, 18 figures

  23. arXiv:2506.06566  [pdf, ps, other

    eess.AS cs.AI

    AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition

    Authors: Chen Bao, Chuanbing Huo, Qinyu Chen, Chang Gao

    Abstract: This paper proposes AS-ASR, a lightweight aphasia-specific speech recognition framework based on Whisper-tiny, tailored for low-resource deployment on edge devices. Our approach introduces a hybrid training strategy that systematically combines standard and aphasic speech at varying ratios, enabling robust generalization, and a GPT-4-based reference enhancement method that refines noisy aphasic tr… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Under review

  24. arXiv:2506.02713  [pdf, ps, other

    cs.AI

    Open-Set Living Need Prediction with Large Language Models

    Authors: Xiaochong Lan, Jie Feng, Yizhou Sun, Chen Gao, Jiahuan Lei, Xinlei Shi, Hengliang Luo, Yong Li

    Abstract: Living needs are the needs people generate in their daily lives for survival and well-being. On life service platforms like Meituan, user purchases are driven by living needs, making accurate living need predictions crucial for personalized service recommendations. Traditional approaches treat this prediction as a closed-set classification problem, severely limiting their ability to capture the di… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  25. arXiv:2506.01939  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Authors: Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well understood. In this work, we undertake a pioneering exploration of RLVR through the novel perspective of token entropy patterns, comprehensively analyzing how different tokens influence reasoning perf… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 25 pages, 17 figures, 2 tables

  26. arXiv:2506.00441  [pdf, ps, other

    cs.IR

    K-order Ranking Preference Optimization for Large Language Models

    Authors: Shihao Cai, Chongming Gao, Yang Zhang, Wentao Shi, Jizhi Zhang, Keqin Bao, Qifan Wang, Fuli Feng

    Abstract: To adapt large language models (LLMs) to ranking tasks, existing list-wise methods, represented by list-wise Direct Preference Optimization (DPO), focus on optimizing partial-order or full-order list ranking consistency for LLMs to enhance their ranking abilities. However, we argue that optimizing top-K ranking consistency could be more appropriate for real-world applications. There are two main r… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  27. arXiv:2505.23838  [pdf, other

    cs.CL cs.IR

    Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities

    Authors: Yiming Huang, Jiyu Guo, Wenxin Mao, Cuiyun Gao, Peiyi Han, Chuanyi Liu, Qing Ling

    Abstract: Converting natural language (NL) questions into SQL queries, referred to as Text-to-SQL, has emerged as a pivotal technology for facilitating access to relational databases, especially for users without SQL knowledge. Recent progress in large language models (LLMs) has markedly propelled the field of natural language processing (NLP), opening new avenues to improve text-to-SQL systems. This study… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Submitted to ACM Computing Surveys (CSUR). Currently under review

    Report number: CSUR-2024-1154

  28. arXiv:2505.22568  [pdf

    eess.IV cs.CV

    Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels

    Authors: Aravind R. Krishnan, Thomas Z. Li, Lucas W. Remedios, Michael E. Kim, Chenyu Gao, Gaurav Rudravaram, Elyssa M. McMaster, Adam M. Saunders, Shunxing Bao, Kaiwen Xu, Lianrui Zuo, Kim L. Sandler, Fabien Maldonado, Yuankai Huo, Bennett A. Landman

    Abstract: Reconstruction kernels in computed tomography (CT) affect spatial resolution and noise characteristics, introducing systematic variability in quantitative imaging measurements such as emphysema quantification. Choosing an appropriate kernel is therefore essential for consistent quantitative analysis. We propose a multipath cycleGAN model for CT kernel harmonization, trained on a mixture of paired… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  29. arXiv:2505.22038  [pdf, ps, other

    cs.CV cs.AI

    Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization

    Authors: Kaiyuan Li, Xiaoyue Chen, Chen Gao, Yong Li, Xinlei Chen

    Abstract: Large Vision-Language Models (LVLMs) have shown impressive performance across multi-modal tasks by encoding images into thousands of tokens. However, the large number of image tokens results in significant computational overhead, and the use of dynamic high-resolution inputs further increases this burden. Previous approaches have attempted to reduce the number of image tokens through token pruning… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  30. arXiv:2505.20218  [pdf, other

    cs.LG

    Fine-grained List-wise Alignment for Generative Medication Recommendation

    Authors: Chenxiao Fan, Chongming Gao, Wentao Shi, Yaxin Gong, Zihao Zhao, Fuli Feng

    Abstract: Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling dr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  31. arXiv:2505.17589  [pdf, ps, other

    cs.SD cs.AI eess.AS

    CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

    Authors: Zhihao Du, Changfeng Gao, Yuxuan Wang, Fan Yu, Tianyu Zhao, Hao Wang, Xiang Lv, Hui Wang, Chongjia Ni, Xian Shi, Keyu An, Guanrou Yang, Yabin Li, Yanni Chen, Zhifu Gao, Qian Chen, Yue Gu, Mengzhe Chen, Yafeng Chen, Shiliang Zhang, Wen Wang, Jieping Ye

    Abstract: In our prior works, we introduced a scalable streaming speech synthesis model, CosyVoice 2, which integrates a large language model (LLM) and a chunk-aware flow matching (FM) model, and achieves low-latency bi-streaming speech synthesis and human-parity quality. Despite these advancements, CosyVoice 2 exhibits limitations in language coverage, domain diversity, data volume, text formats, and post-… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: Preprint, work in progress

  32. arXiv:2505.17134  [pdf, ps, other

    cs.CL cs.AI

    LongMagpie: A Self-synthesis Method for Generating Large-scale Long-context Instructions

    Authors: Chaochen Gao, Xing Wu, Zijia Lin, Debing Zhang, Songlin Hu

    Abstract: High-quality long-context instruction data is essential for aligning long-context large language models (LLMs). Despite the public release of models like Qwen and Llama, their long-context instruction data remains proprietary. Human annotation is costly and challenging, while template-based synthesis methods limit scale, diversity, and quality. We introduce LongMagpie, a self-synthesis framework t… ▽ More

    Submitted 2 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  33. arXiv:2505.16483  [pdf, other

    cs.CL cs.AI

    Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

    Authors: Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Bofei Gao, Kangyang Luo, Wenhao Li, Yufei Huang, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun

    Abstract: Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to improve the faithfulness of LLMs in both short-form and long-form generation tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tas… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  34. arXiv:2505.16455  [pdf, other

    cs.AI cs.CY

    Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events

    Authors: Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin

    Abstract: During sudden disaster events, accurately predicting public panic sentiment on social media is crucial for proactive governance and crisis management. Current efforts on this problem face three main challenges: lack of finely annotated data hinders emotion prediction studies, unmodeled risk perception causes prediction inaccuracies, and insufficient interpretability of panic formation mechanisms.… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  35. arXiv:2505.15475  [pdf, ps, other

    cs.CL cs.AI

    LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models

    Authors: Zhanyue Qin, Yue Ding, Deyuan Liu, Qingbin Liu, Junxian Cai, Xi Chen, Zhiying Tu, Dianhui Chu, Cuiyun Gao, Dianbo Sui

    Abstract: Nowadays, Large Language Models (LLMs) have attracted widespread attention due to their powerful performance. However, due to the unavoidable exposure to socially biased data during training, LLMs tend to exhibit social biases, particularly gender bias. To better explore and quantifying the degree of gender bias in LLMs, we propose a pair of datasets named GenBiasEval and GenHintEval, respectively… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  36. arXiv:2505.15179  [pdf, ps, other

    cs.SE

    RAG or Fine-tuning? A Comparative Study on LCMs-based Code Completion in Industry

    Authors: Chaozheng Wang, Zezhou Yang, Shuzheng Gao, Cuiyun Gao, Ting Peng, Hailiang Huang, Yuetang Deng, Michael Lyu

    Abstract: Code completion, a crucial practice in industrial settings, helps developers improve programming efficiency by automatically suggesting code snippets during development. With the emergence of Large Code Models (LCMs), this field has witnessed significant advancements. Due to the natural differences between open-source and industrial codebases, such as coding patterns and unique internal dependenci… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted in FSE 25 Industry Track

  37. arXiv:2505.13855  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Domain Gating Ensemble Networks for AI-Generated Text Detection

    Authors: Arihant Tripathi, Liam Dugan, Charis Gao, Maggie Huan, Emma Jin, Peter Zhang, David Zhang, Julia Zhao, Chris Callison-Burch

    Abstract: As state-of-the-art language models continue to improve, the need for robust detection of machine-generated text becomes increasingly critical. However, current state-of-the-art machine text detectors struggle to adapt to new unseen domains and generative models. In this paper we present DoGEN (Domain Gating Ensemble Networks), a technique that allows detectors to adapt to unseen domains by ensemb… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Submitted to EMNLP 2025

  38. arXiv:2505.12910  [pdf, ps, other

    cs.SI cs.AI

    SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs

    Authors: Le Cheng, Peican Zhu, Yangming Guo, Chao Gao, Zhen Wang, Keke Tang

    Abstract: Source detection on graphs has demonstrated high efficacy in identifying rumor origins. Despite advances in machine learning-based methods, many fail to capture intrinsic dynamics of rumor propagation. In this work, we present SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs, which harnesses the recent success of the state space model Mamba, known for… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI25

  39. arXiv:2505.12894  [pdf, ps, other

    cs.SI cs.AI

    HyperDet: Source Detection in Hypergraphs via Interactive Relationship Construction and Feature-rich Attention Fusion

    Authors: Le Cheng, Peican Zhu, Yangming Guo, Keke Tang, Chao Gao, Zhen Wang

    Abstract: Hypergraphs offer superior modeling capabilities for social networks, particularly in capturing group phenomena that extend beyond pairwise interactions in rumor propagation. Existing approaches in rumor source detection predominantly focus on dyadic interactions, which inadequately address the complexity of more intricate relational structures. In this study, we present a novel approach for Sourc… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI25

  40. arXiv:2505.12340  [pdf, ps, other

    cs.CV

    DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking

    Authors: Jirong Zha, Yuxuan Fan, Kai Li, Han Li, Chen Gao, Xinlei Chen, Yong Li

    Abstract: State estimation is challenging for 3D object tracking with high maneuverability, as the target's state transition function changes rapidly, irregularly, and is unknown to the estimator. Existing work based on interacting multiple model (IMM) achieves more accurate estimation than single-filter approaches through model combination, aligning appropriate models for different motion modes of the targ… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

    Comments: 10 pages

  41. arXiv:2505.10008  [pdf, other

    cs.SE

    SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion

    Authors: Chaoyang Gao, Xiang Chen, Guangbei Zhang

    Abstract: Context: Software vulnerability assessment (SVA) is critical for identifying, evaluating, and prioritizing security weaknesses in software applications. Objective: Despite the increasing application of large language models (LLMs) in various software engineering tasks, their effectiveness in SVA remains underexplored. Method: To address this gap, we introduce a novel approach SVA-ICL, which levera… ▽ More

    Submitted 28 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: Accepted by Information and Software Technology

  42. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  43. arXiv:2505.08765  [pdf, other

    cs.CV cs.AI

    Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

    Authors: Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li

    Abstract: Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration-exploitation dilemma. To bridge this gap… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  44. arXiv:2505.07968  [pdf, ps, other

    cs.CL

    Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

    Authors: Weiyi Wu, Xinwen Xu, Chongyang Gao, Xingjian Diao, Siting Li, Lucas A. Salas, Jiang Gui

    Abstract: Large Language Models (LLMs) have great potential in the field of health care, yet they face great challenges in adapting to rapidly evolving medical knowledge. This can lead to outdated or contradictory treatment suggestions. This study investigated how LLMs respond to evolving clinical guidelines, focusing on concept drift and internal inconsistencies. We developed the DriftMedQA benchmark to si… ▽ More

    Submitted 19 June, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  45. arXiv:2505.06290  [pdf, other

    cs.LG cs.DM

    UniCO: Towards a Unified Model for Combinatorial Optimization Problems

    Authors: Zefang Zong, Xiaochen Wei, Guozhen Zhang, Chen Gao, Huandong Wang, Yong Li

    Abstract: Combinatorial Optimization (CO) encompasses a wide range of problems that arise in many real-world scenarios. While significant progress has been made in developing learning-based methods for specialized CO problems, a unified model with a single architecture and parameter set for diverse CO problems remains elusive. Such a model would offer substantial advantages in terms of efficiency and conven… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  46. arXiv:2505.06250  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    DeltaDPD: Exploiting Dynamic Temporal Sparsity in Recurrent Neural Networks for Energy-Efficient Wideband Digital Predistortion

    Authors: Yizhuo Wu, Yi Zhu, Kun Qian, Qinyu Chen, Anding Zhu, John Gajadharsing, Leo C. N. de Vreede, Chang Gao

    Abstract: Digital Predistortion (DPD) is a popular technique to enhance signal quality in wideband RF power amplifiers (PAs). With increasing bandwidth and data rates, DPD faces significant energy consumption challenges during deployment, contrasting with its efficiency goals. State-of-the-art DPD models rely on recurrent neural networks (RNN), whose computational complexity hinders system efficiency. This… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

    Comments: Accepted to IEEE Microwave and Wireless Technology Letters (MWTL)

  47. arXiv:2505.05901  [pdf, other

    cs.CV cs.AI

    Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection

    Authors: Hanzhe Liang, Aoran Wang, Jie Zhou, Xin Jin, Can Gao, Jinbao Wang

    Abstract: In this paper, we explore a novel approach to 3D anomaly detection (AD) that goes beyond merely identifying anomalies based on structural characteristics. Our primary perspective is that most anomalies arise from unpredictable defective forces originating from both internal and external sources. To address these anomalies, we seek out opposing forces that can help correct them. Therefore, we intro… ▽ More

    Submitted 15 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: 26 pages

  48. arXiv:2505.05622  [pdf, ps, other

    cs.RO cs.AI

    CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

    Authors: Weichen Zhang, Chen Gao, Shiquan Yu, Ruiying Peng, Baining Zhao, Qian Zhang, Jinqiang Cui, Xinlei Chen, Yong Li

    Abstract: Aerial vision-and-language navigation (VLN), requiring drones to interpret natural language instructions and navigate complex urban environments, emerges as a critical embodied AI challenge that bridges human-robot interaction, 3D spatial reasoning, and real-world deployment. Although existing ground VLN agents achieved notable results in indoor and outdoor settings, they struggle in aerial VLN du… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  49. arXiv:2505.05057  [pdf, other

    cs.SE

    Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware

    Authors: Yujia Chen, Mingyu Chen, Cuiyun Gao, Zhihan Jiang, Zhongqi Li, Yuchi Ma

    Abstract: Application Programming Interfaces (APIs) are crucial in modern software development. Large Language Models (LLMs) assist in automated code generation but often struggle with API hallucination, including invoking non-existent APIs and misusing existing ones in practical development scenarios. Existing studies resort to Retrieval-Augmented Generation (RAG) methods for mitigating the hallucination i… ▽ More

    Submitted 20 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by FSE 2025 Industry Track

  50. arXiv:2505.04084  [pdf, other

    cs.SE cs.AI

    An Empirical Study of OpenAI API Discussions on Stack Overflow

    Authors: Xiang Chen, Jibin Wang, Chaoyang Gao, Xiaolin Ju, Zhanqi Cui

    Abstract: The rapid advancement of large language models (LLMs), represented by OpenAI's GPT series, has significantly impacted various domains such as natural language processing, software development, education, healthcare, finance, and scientific research. However, OpenAI APIs introduce unique challenges that differ from traditional APIs, such as the complexities of prompt engineering, token-based cost m… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.