Skip to main content

Showing 1–50 of 511 results for author: Cheng, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01484  [pdf, ps, other

    cs.CV

    What Really Matters for Robust Multi-Sensor HD Map Construction?

    Authors: Xiaoshuai Hao, Yuting Zhao, Yuheng Ji, Luanyuan Dai, Peng Hao, Dingzhe Li, Shuai Cheng, Rong Yin

    Abstract: High-definition (HD) map construction methods are crucial for providing precise and comprehensive static environmental information, which is essential for autonomous driving systems. While Camera-LiDAR fusion techniques have shown promising results by integrating data from both modalities, existing approaches primarily focus on improving model accuracy and often neglect the robustness of perceptio… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by IROS 2025

  2. arXiv:2507.01335  [pdf, ps, other

    cs.CL cs.AI

    LEDOM: An Open and Fundamental Reverse Language Model

    Authors: Xunjian Yin, Sitao Cheng, Yuxi Xie, Xinyu Hu, Li Lin, Xinyi Wang, Liangming Pan, William Yang Wang, Xiaojun Wan

    Abstract: We introduce LEDOM, the first purely reverse language model, trained autoregressively on 435B tokens with 2B and 7B parameter variants, which processes sequences in reverse temporal order through previous token prediction. For the first time, we present the reverse language model as a potential foundational model across general tasks, accompanied by a set of intriguing examples and insights. Based… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Work in progress

  3. arXiv:2507.00761  [pdf, ps, other

    cs.LG

    A Probabilistic Approach to Wildfire Spread Prediction Using a Denoising Diffusion Surrogate Model

    Authors: Wenbo Yu, Anirbit Ghosh, Tobias Sebastian Finn, Rossella Arcucci, Marc Bocquet, Sibo Cheng

    Abstract: Thanks to recent advances in generative AI, computers can now simulate realistic and complex natural processes. We apply this capability to predict how wildfires spread, a task made difficult by the unpredictable nature of fire and the variety of environmental conditions it depends on. In this study, We present the first denoising diffusion model for predicting wildfire spread, a new kind of AI fr… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  4. arXiv:2507.00394  [pdf, ps, other

    cs.LG cs.DC

    HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism

    Authors: Geng Zhang, Shenggan Cheng, Xuanlei Zhao, Ziming Liu, Yang You

    Abstract: As transformer sequence lengths grow, existing pipeline parallelisms incur suboptimal performance due to the quadratic attention computation and the substantial memory overhead. To relieve these challenges, we propose HelixPipe, a novel pipeline parallelism for long sequence transformer training. First, HelixPipe introduces attention parallel partition, which schedules attention computations of di… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  5. arXiv:2506.23351  [pdf, ps, other

    cs.RO cs.AI cs.LG cs.MA

    Benchmarking Generalizable Bimanual Manipulation: RoboTwin Dual-Arm Collaboration Challenge at CVPR 2025 MEIS Workshop

    Authors: Tianxing Chen, Kaixuan Wang, Zhaohui Yang, Yuhao Zhang, Zanxin Chen, Baijun Chen, Wanxi Dong, Ziyuan Liu, Dong Chen, Tianshuo Yang, Haibao Yu, Xiaokang Yang, Yusen Qin, Zhiqiang Xie, Yao Mu, Ping Luo, Tian Nian, Weiliang Deng, Yiheng Ge, Yibin Liu, Zixuan Li, Dehui Wang, Zhixuan Liang, Haohui Xie, Rijie Zeng , et al. (74 additional authors not shown)

    Abstract: Embodied Artificial Intelligence (Embodied AI) is an emerging frontier in robotics, driven by the need for autonomous systems that can perceive, reason, and act in complex physical environments. While single-arm systems have shown strong task performance, collaborative dual-arm systems are essential for handling more intricate tasks involving rigid, deformable, and tactile-sensitive objects. To ad… ▽ More

    Submitted 2 July, 2025; v1 submitted 29 June, 2025; originally announced June 2025.

    Comments: Challenge Webpage: https://robotwin-benchmark.github.io/cvpr-2025-challenge/

  6. arXiv:2506.22937  [pdf, ps, other

    cs.HC

    GamerAstra: Enhancing Video Game Accessibility for Blind and Low-Vision Players through a Multi-Agent AI Framework

    Authors: Tianrun Qiu, Changxin Chen, Sizhe Cheng, Yiming Yang, Yixiao Guo, Zhicong Lu, Yuxin Ma

    Abstract: Blind and low-vision (BLV) players encounter critical challenges in engaging with video games due to the inaccessibility of visual elements, difficulties in navigating interfaces, and limitations in sending interaction input. Moreover, the development of specialized accessibility features typically requires substantial programming effort and is often implemented on a game-by-game basis. To address… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: 19 pages, 9 figures

    ACM Class: H.5.2

  7. arXiv:2506.22554  [pdf, ps, other

    cs.CV cs.AI

    Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset

    Authors: Vasu Agrawal, Akinniyi Akinyemi, Kathryn Alvero, Morteza Behrooz, Julia Buffalini, Fabio Maria Carlucci, Joy Chen, Junming Chen, Zhang Chen, Shiyang Cheng, Praveen Chowdary, Joe Chuang, Antony D'Avirro, Jon Daly, Ning Dong, Mark Duppenthaler, Cynthia Gao, Jeff Girard, Martin Gleize, Sahir Gomez, Hongyu Gong, Srivathsan Govindarajan, Brandon Han, Sen He, Denise Hernandez , et al. (59 additional authors not shown)

    Abstract: Human communication involves a complex interplay of verbal and nonverbal signals, essential for conveying meaning and achieving interpersonal goals. To develop socially intelligent AI technologies, it is crucial to develop models that can both comprehend and generate dyadic behavioral dynamics. To this end, we introduce the Seamless Interaction Dataset, a large-scale collection of over 4,000 hours… ▽ More

    Submitted 30 June, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

  8. arXiv:2506.21270  [pdf, ps, other

    cs.CV

    Video Virtual Try-on with Conditional Diffusion Transformer Inpainter

    Authors: Cheng Zou, Senlin Cheng, Bolei Xu, Dandan Zheng, Xiaobo Li, Jingdong Chen, Ming Yang

    Abstract: Video virtual try-on aims to naturally fit a garment to a target person in consecutive video frames. It is a challenging task, on the one hand, the output video should be in good spatial-temporal consistency, on the other hand, the details of the given garment need to be preserved well in all the frames. Naively using image-based try-on methods frame by frame can get poor results due to severe inc… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 10 pages, 6 figures

  9. arXiv:2506.18245  [pdf, ps, other

    cs.CR cs.AI cs.SE

    Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection

    Authors: Lei Yu, Zhirong Huang, Hang Yuan, Shiqi Cheng, Li Yang, Fengjun Zhang, Chenjie Shen, Jiajia Ma, Jingyuan Zhang, Junyi Lu, Chun Zuo

    Abstract: Smart contract vulnerability detection remains a major challenge in blockchain security. Existing vulnerability detection methods face two main issues: (1) Existing datasets lack comprehensive coverage and high-quality explanations for preference learning. (2) Large language models (LLMs) often struggle with accurately interpreting specific concepts in smart contract security. Empirical analysis s… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted to ISSTA 2025

  10. arXiv:2506.16703  [pdf, ps, other

    cs.RO

    VLM-Empowered Multi-Mode System for Efficient and Safe Planetary Navigation

    Authors: Sinuo Cheng, Ruyi Zhou, Wenhao Feng, Huaiguang Yang, Haibo Gao, Zongquan Deng, Liang Ding

    Abstract: The increasingly complex and diverse planetary exploration environment requires more adaptable and flexible rover navigation strategy. In this study, we propose a VLM-empowered multi-mode system to achieve efficient while safe autonomous navigation for planetary rovers. Vision-Language Model (VLM) is used to parse scene information by image inputs to achieve a human-level understanding of terrain… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: accepted by IROS 2025

  11. arXiv:2506.14530  [pdf, ps, other

    stat.ML cs.AI cs.LG cs.NE math.ST

    Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters

    Authors: Anastasis Kratsios, Tin Sum Cheng, Aurelien Lucchi, Haitz Sáez de Ocáriz Borde

    Abstract: Low-Rank Adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning (PEFT) technique for foundation models. Recent work has highlighted an inherent asymmetry in the initialization of LoRA's low-rank factors, which has been present since its inception and was presumably derived experimentally. This paper focuses on providing a comprehensive theoretical characterization of asy… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  12. C2TE: Coordinated Constrained Task Execution Design for Ordering-Flexible Multi-Vehicle Platoon Merging

    Authors: Bin-Bin Hu, Yanxin Zhou, Henglai Wei, Shuo Cheng, Chen Lv

    Abstract: In this paper, we propose a distributed coordinated constrained task execution (C2TE) algorithm that enables a team of vehicles from different lanes to cooperatively merge into an {\it ordering-flexible platoon} maneuvering on the desired lane. Therein, the platoon is flexible in the sense that no specific spatial ordering sequences of vehicles are predetermined. To attain such a flexible platoon,… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Journal ref: Automatica, 2025

  13. arXiv:2506.11059  [pdf, ps, other

    cs.SE cs.CL cs.CY cs.LG

    CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs

    Authors: Hanxi Guo, Siyuan Cheng, Kaiyuan Zhang, Guangyu Shen, Xiangyu Zhang

    Abstract: Large language models (LLMs) have become integral to modern software development, producing vast amounts of AI-generated source code. While these models boost programming productivity, their misuse introduces critical risks, including code plagiarism, license violations, and the propagation of insecure programs. As a result, robust detection of AI-generated code is essential. To support the develo… ▽ More

    Submitted 26 May, 2025; originally announced June 2025.

  14. arXiv:2506.10960  [pdf, other

    cs.CL cs.AI cs.CR cs.IR cs.LG

    ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

    Authors: Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng

    Abstract: Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We prese… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Work in progress

  15. arXiv:2506.10424  [pdf, ps, other

    cs.CR cs.AI

    SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

    Authors: Kaiyuan Zhang, Siyuan Cheng, Hanxi Guo, Yuetian Chen, Zian Su, Shengwei An, Yuntao Du, Charles Fleming, Ashish Kundu, Xiangyu Zhang, Ninghui Li

    Abstract: Large language models (LLMs) have achieved remarkable success and are widely adopted for diverse applications. However, fine-tuning these models often involves private or sensitive information, raising critical privacy concerns. In this work, we conduct the first comprehensive study evaluating the vulnerability of fine-tuned LLMs to membership inference attacks (MIAs). Our empirical analysis demon… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted by the 34th USENIX Security Symposium 2025. Code is available at https://github.com/KaiyuanZh/SOFT

  16. arXiv:2506.09718  [pdf, ps, other

    cs.CV cs.AI

    Non-Contact Health Monitoring During Daily Personal Care Routines

    Authors: Xulin Ma, Jiankai Tang, Zhang Jiang, Songqin Cheng, Yuanchun Shi, Dong LI, Xin Liu, Daniel McDuff, Xiaojing Liu, Yuntao Wang

    Abstract: Remote photoplethysmography (rPPG) enables non-contact, continuous monitoring of physiological signals and offers a practical alternative to traditional health sensing methods. Although rPPG is promising for daily health monitoring, its application in long-term personal care scenarios, such as mirror-facing routines in high-altitude environments, remains challenging due to ambient lighting variati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  17. arXiv:2506.08528  [pdf, ps, other

    cs.DC cs.LG cs.OS

    PerfTracker: Online Performance Troubleshooting for Large-scale Model Training in Production

    Authors: Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai

    Abstract: Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and the data intensity of the training process. Existing troubleshooting approaches designed for traditional distributed systems or datacenter networks fall short and can hardly apply to real-world trainin… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  18. arXiv:2506.05007  [pdf, ps, other

    cs.AR cs.LG

    QiMeng: Fully Automated Hardware and Software Design for Processor Chip

    Authors: Rui Zhang, Yuanbo Wen, Shuyao Cheng, Di Huang, Shaohui Peng, Jiaming Guo, Pengwei Jin, Jiacheng Zhao, Tianrui Ma, Yaoyu Zhu, Yifan Hao, Yongwei Zhao, Shengwen Liang, Ying Wang, Xing Hu, Zidong Du, Huimin Cui, Ling Li, Qi Guo, Yunji Chen

    Abstract: Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  19. arXiv:2506.03483  [pdf, ps, other

    cs.CL

    APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training

    Authors: Jun Rao, Zepeng Lin, Xuebo Liu, Xiaopeng Ke, Lian Lian, Dong Jin, Shengjun Cheng, Jun Yu, Min Zhang

    Abstract: Large Language Models (LLMs) often require domain-specific fine-tuning to address targeted tasks, which risks degrading their general capabilities. Maintaining a balance between domain-specific enhancements and general model utility is a key challenge. This paper proposes a novel approach named APT (Weakness Case Acquisition and Iterative Preference Training) to enhance domain-specific performance… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: ACL2025 Findings

  20. arXiv:2506.01565  [pdf, ps, other

    cs.CL cs.CV

    Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

    Authors: Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li

    Abstract: Culture is a rich and dynamic domain that evolves across both geography and time. However, existing studies on cultural understanding with vision-language models (VLMs) primarily emphasize geographic diversity, often overlooking the critical temporal dimensions. To bridge this gap, we introduce Hanfu-Bench, a novel, expert-curated multimodal dataset. Hanfu, a traditional garment spanning ancient C… ▽ More

    Submitted 17 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: cultural analysis, cultural visual understanding, cultural image transcreation (update dataset license)

  21. arXiv:2506.01562  [pdf, ps, other

    cs.LG stat.ML

    Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization

    Authors: Wojciech Masarczyk, Mateusz Ostaszewski, Tin Sum Cheng, Tomasz Trzciński, Aurelien Lucchi, Razvan Pascanu

    Abstract: The softmax function is a fundamental building block of deep neural networks, commonly used to define output distributions in classification tasks or attention weights in transformer architectures. Despite its widespread use and proven effectiveness, its influence on learning dynamics and learned representations remains poorly understood, limiting our ability to optimize model behavior. In this pa… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  22. arXiv:2506.00471  [pdf, other

    physics.geo-ph cs.LG physics.comp-ph

    DiffPINN: Generative diffusion-initialized physics-informed neural networks for accelerating seismic wavefield representation

    Authors: Shijun Cheng, Tariq Alkhalifah

    Abstract: Physics-informed neural networks (PINNs) offer a powerful framework for seismic wavefield modeling, yet they typically require time-consuming retraining when applied to different velocity models. Moreover, their training can suffer from slow convergence due to the complexity of of the wavefield solution. To address these challenges, we introduce a latent diffusion-based strategy for rapid and effe… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  23. arXiv:2505.24183  [pdf, ps, other

    cs.LG cs.AR cs.PL

    CodeV-R1: Reasoning-Enhanced Verilog Generation

    Authors: Yaoyu Zhu, Di Huang, Hanqi Lyu, Xiaoyun Zhang, Chongxiao Li, Wenxuan Shi, Yutong Wu, Jianan Mu, Jinghua Wang, Yang Zhao, Pengwei Jin, Shuyao Cheng, Shengwen Liang, Xishan Zhang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) spec… ▽ More

    Submitted 20 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  24. arXiv:2505.24068  [pdf, ps, other

    cs.RO

    DiffCoTune: Differentiable Co-Tuning for Cross-domain Robot Control

    Authors: Lokesh Krishna, Sheng Cheng, Junheng Li, Naira Hovakimyan, Quan Nguyen

    Abstract: The deployment of robot controllers is hindered by modeling discrepancies due to necessary simplifications for computational tractability or inaccuracies in data-generating simulators. Such discrepancies typically require ad-hoc tuning to meet the desired performance, thereby ensuring successful transfer to a target domain. We propose a framework for automated, gradient-based tuning to enhance per… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 8 pages, 8 figures

  25. arXiv:2505.21956  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Cross-modal RAG: Sub-dimensional Retrieval-Augmented Text-to-Image Generation

    Authors: Mengdan Zhu, Senhao Cheng, Guangji Bai, Yifei Zhang, Liang Zhao

    Abstract: Text-to-image generation increasingly demands access to domain-specific, fine-grained, and rapidly evolving knowledge that pretrained models cannot fully capture. Existing Retrieval-Augmented Generation (RAG) methods attempt to address this by retrieving globally relevant images, but they fail when no single image contains all desired elements from a complex user query. We propose Cross-modal RAG,… ▽ More

    Submitted 28 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  26. arXiv:2505.21775  [pdf, other

    cs.LG cs.AI math.OC

    DualSchool: How Reliable are LLMs for Optimization Education?

    Authors: Michael Klamkin, Arnaud Deza, Sikai Cheng, Haoruo Zhao, Pascal Van Hentenryck

    Abstract: Consider the following task taught in introductory optimization courses which addresses challenges articulated by the community at the intersection of (generative) AI and OR: generate the dual of a linear program. LLMs, being trained at web-scale, have the conversion process and many instances of Primal to Dual Conversion (P2DC) at their disposal. Students may thus reasonably expect that LLMs woul… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  27. arXiv:2505.21473  [pdf, ps, other

    cs.CV

    DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

    Authors: Yiheng Liu, Liao Qu, Huichao Zhang, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Xian Li, Shuai Wang, Daniel K. Du, Shu Cheng, Zehuan Yuan, Xinglong Wu

    Abstract: This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively degraded images, DetailFlow enables the generation process to start from the global structure and incrementally refine details. This coarse-to-fine 1D token seq… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  28. arXiv:2505.20622  [pdf, other

    cs.CL cs.AI cs.LG

    SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation

    Authors: Ting Xu, Zhichao Huang, Jiankai Sun, Shanbo Cheng, Wai Lam

    Abstract: We present Sequential Policy Optimization for Simultaneous Machine Translation (SeqPO-SiMT), a new policy optimization framework that defines the simultaneous machine translation (SiMT) task as a sequential decision making problem, incorporating a tailored reward to enhance translation quality while reducing latency. In contrast to popular Reinforcement Learning from Human Feedback (RLHF) methods,… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted by The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  29. arXiv:2505.19151  [pdf, ps, other

    cs.GR cs.AI cs.CV

    SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

    Authors: Shenggan Cheng, Yuanxin Wei, Lansong Diao, Yong Liu, Bujiao Chen, Lianghua Huang, Yu Liu, Wenyuan Yu, Jiangsu Du, Wei Lin, Yang You

    Abstract: Leveraging the diffusion transformer (DiT) architecture, models like Sora, CogVideoX and Wan have achieved remarkable progress in text-to-video, image-to-video, and video editing tasks. Despite these advances, diffusion-based video generation remains computationally intensive, especially for high-resolution, long-duration videos. Prior work accelerates its inference by skipping computation, usuall… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 9 pages, 6 figures

  30. arXiv:2505.18686  [pdf, ps, other

    cs.CV

    WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation

    Authors: Yang Liu, Silin Cheng, Xinwei He, Sebastien Ourselin, Lei Tan, Gen Luo

    Abstract: Weakly supervised referring expression comprehension(WREC) and segmentation(WRES) aim to learn object grounding based on a given expression using weak supervision signals like image-text pairs. While these tasks have traditionally been modeled separately, we argue that they can benefit from joint learning in a multi-task framework. To this end, we propose WeakMCN, a novel multi-task collaborative… ▽ More

    Submitted 28 May, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: Accepted by CVPR2025

  31. arXiv:2505.17550  [pdf, ps, other

    cs.CV

    T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models

    Authors: Xiaoyu Ye, Songjie Cheng, Yongtao Wang, Yajiao Xiong, Yishen Li

    Abstract: Recent advances in text-to-video (T2V) diffusion models have significantly enhanced the quality of generated videos. However, their ability to produce explicit or harmful content raises concerns about misuse and potential rights violations. Inspired by the success of unlearning techniques in erasing undesirable concepts from text-to-image (T2I) models, we extend unlearning to T2V models and propos… ▽ More

    Submitted 30 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  32. arXiv:2505.16972  [pdf, ps, other

    cs.CL cs.SD eess.AS

    From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

    Authors: Tianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng

    Abstract: Recent advances in Automatic Speech Recognition (ASR) have been largely fueled by massive speech corpora. However, extending coverage to diverse languages with limited resources remains a formidable challenge. This paper introduces Speech Back-Translation, a scalable pipeline that improves multilingual ASR models by converting large-scale text corpora into synthetic speech via off-the-shelf text-t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  33. arXiv:2505.16369  [pdf, ps, other

    cs.SD eess.AS

    X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

    Authors: Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan

    Abstract: We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech, environmental sounds, and music, X-ARES provides two evaluation approaches for evaluating audio representations: linear fine-tuning and unparameterized evaluation. The fra… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  34. arXiv:2505.11227  [pdf, ps, other

    cs.AI cs.LG

    Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs

    Authors: Zhangying Feng, Qianglong Chen, Ning Lu, Yongqian Li, Siqi Cheng, Shuangmu Peng, Duyu Tang, Shengcai Liu, Zhirui Zhang

    Abstract: The development of reasoning capabilities represents a critical frontier in large language models (LLMs) research, where reinforcement learning (RL) and process reward models (PRMs) have emerged as predominant methodological frameworks. Contrary to conventional wisdom, empirical evidence from DeepSeek-R1 demonstrates that pure RL training focused on mathematical problem-solving can progressively e… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  35. arXiv:2505.09450  [pdf, ps, other

    cs.CV

    MrTrack: Register Mamba for Needle Tracking with Rapid Reciprocating Motion during Ultrasound-Guided Aspiration Biopsy

    Authors: Yuelin Zhang, Qingpeng Ding, Long Lei, Yongxuan Feng, Raymond Shing-Yan Tang, Shing Shin Cheng

    Abstract: Ultrasound-guided fine needle aspiration (FNA) biopsy is a common minimally invasive diagnostic procedure. However, an aspiration needle tracker addressing rapid reciprocating motion is still missing. MrTrack, an aspiration needle tracker with a mamba-based register mechanism, is proposed. MrTrack leverages a Mamba-based register extractor to sequentially distill global context from each historica… ▽ More

    Submitted 29 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: Early Accepted by MICCAI 2025

  36. arXiv:2505.09370  [pdf, ps, other

    cs.DS

    A Dynamic Working Set Method for Compressed Sensing

    Authors: Siu-Wing Cheng, Man Ting Wong

    Abstract: We propose a dynamic working set method (DWS) for the problem $\min_{\mathtt{x} \in \mathbb{R}^n} \frac{1}{2}\|\mathtt{Ax}-\mathtt{b}\|^2 + η\|\mathtt{x}\|_1$ that arises from compressed sensing. DWS manages the working set while iteratively calling a regression solver to generate progressively better solutions. Our experiments show that DWS is more efficient than other state-of-the-art software i… ▽ More

    Submitted 5 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  37. arXiv:2505.07360  [pdf, ps, other

    cs.SE

    BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

    Authors: Xiuwei Shang, Guoqiang Chen, Shaoyin Cheng, Benlong Wu, Li Hu, Gangyang Li, Weiming Zhang, Nenghai Yu

    Abstract: Binary analysis remains pivotal in software security, offering insights into compiled programs without source code access. As large language models (LLMs) continue to excel in diverse language understanding and generation tasks, their potential in decoding complex binary data structures becomes evident. However, the lack of standardized benchmarks in this domain limits the assessment and compariso… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 23 pages, 5 figures, to be published in IJCAI 2025

  38. arXiv:2505.04519  [pdf, other

    cs.CL

    Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs

    Authors: Yehui Tang, Yichun Yin, Yaoyuan Wang, Hang Zhou, Yu Pan, Wei Guo, Ziyang Zhang, Miao Rang, Fangcheng Liu, Naifu Zhang, Binghan Li, Yonghan Dong, Xiaojun Meng, Yasheng Wang, Dong Li, Yin Li, Dandan Tu, Can Chen, Youliang Yan, Fisher Yu, Ruiming Tang, Yunhe Wang, Botian Huang, Bo Wang, Boxiao Liu , et al. (49 additional authors not shown)

    Abstract: Sparse large language models (LLMs) with Mixture of Experts (MoE) and close to a trillion parameters are dominating the realm of most capable language models. However, the massive model scale poses significant challenges for the underlying software and hardware systems. In this paper, we aim to uncover a recipe to harness such scale on Ascend NPUs. The key goals are better usage of the computing r… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  39. arXiv:2505.04254  [pdf, other

    cs.SE

    CompileAgent: Automated Real-World Repo-Level Compilation with Tool-Integrated LLM-based Agent System

    Authors: Li Hu, Guoqiang Chen, Xiuwei Shang, Shaoyin Cheng, Benlong Wu, Gangyang Li, Xu Zhu, Weiming Zhang, Nenghai Yu

    Abstract: With open-source projects growing in size and complexity, manual compilation becomes tedious and error-prone, highlighting the need for automation to improve efficiency and accuracy. However, the complexity of compilation instruction search and error resolution makes automatic compilation challenging. Inspired by the success of LLM-based agents in various fields, we propose CompileAgent, the first… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 12 pages, 4 figures

  40. arXiv:2505.03195  [pdf, other

    cs.AR

    QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies

    Authors: Shuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen

    Abstract: Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on sup… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures

  41. arXiv:2504.21803  [pdf, other

    cs.SE cs.CR

    An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

    Authors: Xiuwei Shang, Zhenkan Fu, Shaoyin Cheng, Guoqiang Chen, Gangyang Li, Li Hu, Weiming Zhang, Nenghai Yu

    Abstract: Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code, reverse engineers face significant challenges in understanding binary code due to the lack of intuitive semantic information. Although traditional reverse tools ca… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 38 pages, 9 figures

  42. arXiv:2504.18058  [pdf, other

    cs.CL cs.AI

    Exploring Personality-Aware Interactions in Salesperson Dialogue Agents

    Authors: Sijia Cheng, Wen-Yu Chang, Yun-Nung Chen

    Abstract: The integration of dialogue agents into the sales domain requires a deep understanding of how these systems interact with users possessing diverse personas. This study explores the influence of user personas, defined using the Myers-Briggs Type Indicator (MBTI), on the interaction quality and performance of sales-oriented dialogue agents. Through large-scale testing and analysis, we assess the pre… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Accepted by IWSDS 2025

  43. arXiv:2504.14669  [pdf, other

    cs.CL

    Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data

    Authors: Wei Zou, Sen Yang, Yu Bao, Shujian Huang, Jiajun Chen, Shanbo Cheng

    Abstract: The rise of Large Language Models (LLMs) has reshaped machine translation (MT), but multilingual MT still relies heavily on parallel data for supervised fine-tuning (SFT), facing challenges like data scarcity for low-resource languages and catastrophic forgetting. To address these issues, we propose TRANS-ZERO, a self-play framework that leverages only monolingual data and the intrinsic multilingu… ▽ More

    Submitted 17 May, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: 11 pages, 4 figures, accepted by ACL 2025 as findings

  44. arXiv:2504.14274  [pdf, other

    cs.AI

    ProtPainter: Draw or Drag Protein via Topology-guided Diffusion

    Authors: Zhengxi Lu, Shizhuo Cheng, Yuru Jiang, Yan Zhang, Min Zhang

    Abstract: Recent advances in protein backbone generation have achieved promising results under structural, functional, or physical constraints. However, existing methods lack the flexibility for precise topology control, limiting navigation of the backbone space. We present ProtPainter, a diffusion-based approach for generating protein backbones conditioned on 3D curves. ProtPainter follows a two-stage proc… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Published as a conference paper at ICLR 2025

  45. arXiv:2504.12234  [pdf, other

    cs.SE

    MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models

    Authors: Hang Yuan, Lei Yu, Zhirong Huang, Jingyuan Zhang, Junyi Lu, Shiqi Cheng, Li Yang, Fengjun Zhang, Jiajia Ma, Chun Zuo

    Abstract: Smart contract vulnerabilities pose significant security risks to blockchain systems, potentially leading to severe financial losses. Existing methods face several limitations: (1) Program analysis-based approaches rely on predefined patterns, lacking flexibility for new vulnerability types; (2) Deep learning-based methods lack explanations; (3) Large language model-based approaches suffer from hi… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  46. arXiv:2504.11670  [pdf, other

    quant-ph cs.IT

    Adaptive Error Correction for Entanglement Distillation

    Authors: Sijie Cheng, Narayanan Rengaswamy

    Abstract: Quantum network applications impose a variety of requirements on entanglement resources in terms of rate, fidelity, latency, and more. The repeaters in the quantum network must combine good methods for entanglement generation, effective entanglement distillation, and smart routing protocols to satisfy these application requirements. In this work, we focus on quantum error correction-based entangle… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  47. arXiv:2504.10979  [pdf, other

    cs.CV

    Deep Learning in Concealed Dense Prediction

    Authors: Pancheng Zhao, Deng-Ping Fan, Shupeng Cheng, Salman Khan, Fahad Shahbaz Khan, David Clifton, Peng Xu, Jufeng Yang

    Abstract: Deep learning is developing rapidly and handling common computer vision tasks well. It is time to pay attention to more complex vision tasks, as model size, knowledge, and reasoning capabilities continue to improve. In this paper, we introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc. CDP's intrinsic trait is… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Technique Report

  48. arXiv:2504.09680  [pdf, other

    cs.LG cs.AI math.OC

    SPOT: Spatio-Temporal Pattern Mining and Optimization for Load Consolidation in Freight Transportation Networks

    Authors: Sikai Cheng, Amira Hijazi, Jeren Konak, Alan Erera, Pascal Van Hentenryck

    Abstract: Freight consolidation has significant potential to reduce transportation costs and mitigate congestion and pollution. An effective load consolidation plan relies on carefully chosen consolidation points to ensure alignment with existing transportation management processes, such as driver scheduling, personnel planning, and terminal operations. This complexity represents a significant challenge whe… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  49. arXiv:2504.07866  [pdf, ps, other

    cs.CL cs.AI

    Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

    Authors: Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, Dong Li, Can Chen, Dandan Tu, Yin Li, Fisher Yu, Ruiming Tang, Yunhe Wang, Baojun Wang, Bin Wang, Bo Wang, Boxiao Liu, Changzheng Zhang, Duyu Tang, Fei Mi, Hui Jin , et al. (27 additional authors not shown)

    Abstract: We present Pangu Ultra, a Large Language Model (LLM) with 135 billion parameters and dense Transformer modules trained on Ascend Neural Processing Units (NPUs). Although the field of LLM has been witnessing unprecedented advances in pushing the scale and capability of LLM in recent years, training such a large-scale model still involves significant optimization and system challenges. To stabilize… ▽ More

    Submitted 11 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: fix conflicts of latex pacakges

  50. arXiv:2504.03170  [pdf, other

    cs.LG

    Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm

    Authors: Huong Pham, Samuel Cheng, Tao Hu, Chengbin Deng

    Abstract: Given the growing environmental challenges, accurate monitoring and prediction of changes in water bodies are essential for sustainable management and conservation. The Continuous Monitoring of Land Disturbance (COLD) algorithm provides a valuable tool for real-time analysis of land changes, such as deforestation, urban expansion, agricultural activities, and natural disasters. This capability ena… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.