Skip to main content

Showing 1–50 of 1,047 results for author: Du, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26574  [pdf, ps, other

    cs.AI cond-mat.other cs.CL hep-th quant-ph

    Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

    Authors: Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Lifan Yuan, Ziming Ji, Indranil Das, Junyi Cao, Yufeng Du, Jinchen He, Yifan Su, Jiabin Yu, Yikun Jiang, Yujie Zhang, Chang Liu, Ze-Min Huang, Weizhen Jia, Xinan Chen, Peixue Wu, Yunkai Wang, Juntai Zhou , et al. (39 additional authors not shown)

    Abstract: While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integr… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 39 pages, 6 figures, 6 tables

  2. Joyride: Rethinking Linux's network stack design for better performance, security, and reliability

    Authors: Yanlin Du, Ruslan Nikolaev

    Abstract: Contemporary distributed computing workloads, including scientific computation, data mining, and machine learning, increasingly demand OS networking with minimal latency as well as high throughput, security, and reliability. However, Linux's conventional TCP/IP stack becomes increasingly problematic for high-end NICs, particularly those operating at 100 Gbps and beyond. These limitations come ma… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Journal ref: 3rd Workshop on Kernel Isolation, Safety and Verification (KISV 2025)

  3. arXiv:2509.24957  [pdf, ps, other

    cs.LG

    Intra-request branch orchestration for efficient LLM reasoning

    Authors: Weifan Jiang, Rana Shahout, Yilun Du, Michael Mitzenmacher, Minlan Yu

    Abstract: Large Language Models (LLMs) increasingly rely on inference-time reasoning algorithms such as chain-of-thought and multi-branch reasoning to improve accuracy on complex tasks. These methods, however, substantially increase token usage and per-request latency. Prior work has largely focused on reducing token usage, often at the expense of accuracy, while overlooking other latency factors. We presen… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 15 pages, 6 figures

  4. arXiv:2509.24816  [pdf, ps, other

    cs.CL

    KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning

    Authors: Xilin Dang, Kexin Chen, Xiaorui Su, Ayush Noori, Iñaki Arango, Lucas Vittor, Xinyi Long, Yuyang Du, Marinka Zitnik, Pheng Ann Heng

    Abstract: In clinical practice, physicians refrain from making decisions when patient information is insufficient. This behavior, known as abstention, is a critical safety mechanism preventing potentially harmful misdiagnoses. Recent investigations have reported the application of large language models (LLMs) in medical scenarios. However, existing LLMs struggle with the abstentions, frequently providing ov… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  5. arXiv:2509.23972  [pdf, ps, other

    cs.AR

    AssertFix: Empowering Automated Assertion Fix via Large Language Models

    Authors: Hongqin Lyu, Yunlin Du, Yonghao Wang, Zhiteng Chao, Tiancheng Wang, Huawei Li

    Abstract: Assertion-based verification (ABV) is critical in ensuring that register-transfer level (RTL) designs conform to their functional specifications. SystemVerilog Assertions (SVA) effectively specify design properties, but writing and maintaining them manually is challenging and error-prone. Although recent progress of assertion generation methods leveraging large language models (LLMs) have shown gr… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 6 pages, 6 figures

  6. arXiv:2509.23772  [pdf, ps, other

    cs.CV stat.AP

    A Modality-Tailored Graph Modeling Framework for Urban Region Representation via Contrastive Learning

    Authors: Yaya Zhao, Kaiqi Zhao, Zixuan Tang, Zhiyuan Liu, Xiaoling Lu, Yalei Du

    Abstract: Graph-based models have emerged as a powerful paradigm for modeling multimodal urban data and learning region representations for various downstream tasks. However, existing approaches face two major limitations. (1) They typically employ identical graph neural network architectures across all modalities, failing to capture modality-specific structures and characteristics. (2) During the fusion st… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  7. arXiv:2509.23674  [pdf, ps, other

    cs.AR

    AssertGen: Enhancement of LLM-aided Assertion Generation through Cross-Layer Signal Bridging

    Authors: Hongqin Lyu, Yonghao Wang, Yunlin Du, Mingyu Shi, Zhiteng Chao, Wenxing Li, Tiancheng Wang, Huawei Li

    Abstract: Assertion-based verification (ABV) serves as a crucial technique for ensuring that register-transfer level (RTL) designs adhere to their specifications. While Large Language Model (LLM) aided assertion generation approaches have recently achieved remarkable progress, existing methods are still unable to effectively identify the relationship between design specifications and RTL designs, which lead… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

    Comments: 6 pages, 7 figures

  8. arXiv:2509.23468  [pdf, ps, other

    cs.RO cs.AI cs.LG

    Multi-Modal Manipulation via Multi-Modal Policy Consensus

    Authors: Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, Katherine Driggs-Campbell

    Abstract: Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 9 pages, 7 figures

  9. arXiv:2509.23265  [pdf, ps, other

    cs.LG

    CREPE: Controlling Diffusion with Replica Exchange

    Authors: Jiajun He, Paul Jeha, Peter Potaptchik, Leo Zhang, José Miguel Hernández-Lobato, Yuanqi Du, Saifuddin Syed, Francisco Vargas

    Abstract: Inference-time control of diffusion models aims to steer model outputs to satisfy new constraints without retraining. Previous approaches have mostly relied on heuristic guidance or have been coupled with Sequential Monte Carlo (SMC) for bias correction. In this paper, we propose a flexible alternative based on replica exchange, an algorithm designed initially for sampling problems. We refer to th… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: 29 pages, 14 figures, 3 tables

  10. arXiv:2509.22910  [pdf, ps, other

    cs.RO

    Good Weights: Proactive, Adaptive Dead Reckoning Fusion for Continuous and Robust Visual SLAM

    Authors: Yanwei Du, Jing-Chen Peng, Patricio A. Vela

    Abstract: Given that Visual SLAM relies on appearance cues for localization and scene understanding, texture-less or visually degraded environments (e.g., plain walls or low lighting) lead to poor pose estimation and track loss. However, robots are typically equipped with sensors that provide some form of dead reckoning odometry with reasonable short-time performance but unreliable long-time performance. Th… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 8 pages, 9 figures, 1 table. Submitted to IEEE Conference

  11. arXiv:2509.21983  [pdf, ps, other

    cs.RO cs.AI

    Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning

    Authors: Sigmund Hennum Høeg, Aksel Vaaler, Chaoqi Liu, Olav Egeland, Yilun Du

    Abstract: Constructing robots to accomplish long-horizon tasks is a long-standing challenge within artificial intelligence. Approaches using generative methods, particularly Diffusion Models, have gained attention due to their ability to model continuous robotic trajectories for planning and control. However, we show that these models struggle with long-horizon tasks that involve complex decision-making and… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

    Comments: 10 pages, 11 figures. This work has been submitted to the IEEE for possible publication. See https://sigmundhh.com/hybrid_diffusion/ for the project website

  12. arXiv:2509.20733  [pdf, ps, other

    quant-ph cs.LG

    PALQO: Physics-informed Model for Accelerating Large-scale Quantum Optimization

    Authors: Yiming Huang, Yajie Hao, Jing Zhou, Xiao Yuan, Xiaoting Wang, Yuxuan Du

    Abstract: Variational quantum algorithms (VQAs) are leading strategies to reach practical utilities of near-term quantum devices. However, the no-cloning theorem in quantum mechanics precludes standard backpropagation, leading to prohibitive quantum resource costs when applying VQAs to large-scale tasks. To address this challenge, we reformulate the training dynamics of VQAs as a nonlinear partial different… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  13. arXiv:2509.19954  [pdf, ps, other

    cs.RO

    Robot Trajectron V2: A Probabilistic Shared Control Framework for Navigation

    Authors: Pinhao Song, Yurui Du, Ophelie Saussus, Sofie De Schrijver, Irene Caprara, Peter Janssen, Renaud Detry

    Abstract: We propose a probabilistic shared-control solution for navigation, called Robot Trajectron V2 (RT-V2), that enables accurate intent prediction and safe, effective assistance in human-robot interaction. RT-V2 jointly models a user's long-term behavioral patterns and their noisy, low-dimensional control signals by combining a prior intent model with a posterior update that accounts for real-time use… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 26 pages, 20 figures

  14. arXiv:2509.18655  [pdf, ps, other

    cs.CL

    Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering

    Authors: Lingwen Deng, Yifei Han, Long Zhang, Yue Du, Bin Li

    Abstract: Parameter-Preserving Knowledge Editing (PPKE) enables updating models with new or corrected information without retraining or parameter adjustment. Recent PPKE approaches based on knowledge graphs (KG) to extend knowledge editing (KE) capabilities to multi-hop question answering (MHQA). However, these methods often lack consistency, leading to knowledge contamination, unstable updates, and retriev… ▽ More

    Submitted 23 September, 2025; originally announced September 2025.

    Comments: Submitted to ICASSP 2026

  15. arXiv:2509.18585  [pdf, ps, other

    cs.CL cs.AI

    TsqLoRA: Towards Sensitivity and Quality Low-Rank Adaptation for Efficient Fine-Tuning

    Authors: Yu Chen, Yifei Han, Long Zhang, Yue Du, Bin Li

    Abstract: Fine-tuning large pre-trained models for downstream tasks has become a fundamental approach in natural language processing. Fully fine-tuning all model parameters is computationally expensive and memory-intensive, especially in resource-constrained environments. Existing parameter-efficient fine-tuning methods reduce the number of trainable parameters but typically overlook the varying sensitivity… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: 5 pages, 4 figures, published to ICASSP2026

  16. arXiv:2509.18208  [pdf, ps, other

    cs.LG cs.AI

    Variational Task Vector Composition

    Authors: Boyuan Zhang, Yingjun Du, Xiantong Zhen, Ling Shao

    Abstract: Task vectors capture how a model changes during fine-tuning by recording the difference between pre-trained and task-specific weights. The composition of task vectors, a key operator in task arithmetic, enables models to integrate knowledge from multiple tasks without incurring additional inference costs. In this paper, we propose variational task vector composition, where composition coefficients… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

  17. arXiv:2509.17924  [pdf, ps, other

    cs.LG q-bio.TO

    Medical priority fusion: achieving dual optimization of sensitivity and interpretability in nipt anomaly detection

    Authors: Xiuqi Ge, Zhibo Yao, Yaosong Du

    Abstract: Clinical machine learning faces a critical dilemma in high-stakes medical applications: algorithms achieving optimal diagnostic performance typically sacrifice the interpretability essential for physician decision-making, while interpretable methods compromise sensitivity in complex scenarios. This paradox becomes particularly acute in non-invasive prenatal testing (NIPT), where missed chromosomal… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: 24 pages, 47 figures, publish to BIBM

  18. arXiv:2509.17918  [pdf, ps, other

    cs.IR cs.LG

    Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles

    Authors: Yuanrong Wang, Yingpeng Du

    Abstract: Recommender systems (RS) greatly influence users' consumption decisions, making them attractive targets for malicious shilling attacks that inject fake user profiles to manipulate recommendations. Existing shilling methods can generate effective and stealthy fake profiles when training data only contain rating matrix, but they lack comprehensive solutions for scenarios where side features are pres… ▽ More

    Submitted 24 September, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

  19. arXiv:2509.17088  [pdf, ps, other

    cs.CV

    AlignedGen: Aligning Style Across Generated Images

    Authors: Jiexuan Zhang, Yiheng Du, Qian Wang, Weiqi Li, Yu Gu, Jian Zhang

    Abstract: Despite their generative power, diffusion models struggle to maintain style consistency across images conditioned on the same style prompt, hindering their practical deployment in creative workflows. While several training-free methods attempt to solve this, they are constrained to the U-Net architecture, which not only leads to low-quality results and artifacts like object repetition but also ren… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  20. arXiv:2509.17065  [pdf, ps, other

    cs.CV

    CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner

    Authors: Yao Du, Jiarong Guo, Xiaomeng Li

    Abstract: Echocardiography is a vital non-invasive modality for cardiac assessment, with left ventricular ejection fraction (LVEF) serving as a key indicator of heart function. Existing LVEF estimation methods depend on large-scale annotated video datasets, which are costly and limit adaptability across various clinical settings. Recent vision-language models for echocardiography, such as EchoCLIP, apply im… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

    Comments: Accepted by MICCAI 2025

  21. arXiv:2509.17034  [pdf, ps, other

    cs.LG cs.CV

    Long-Tailed Out-of-Distribution Detection with Refined Separate Class Learning

    Authors: Shuai Feng, Yuxin Ge, Yuntao Du, Mingcai Chen, Chongjun Wang, Lei Feng

    Abstract: Out-of-distribution (OOD) detection is crucial for deploying robust machine learning models. However, when training data follows a long-tailed distribution, the model's ability to accurately detect OOD samples is significantly compromised, due to the confusion between OOD samples and head/tail classes. To distinguish OOD samples from both head and tail classes, the separate class learning (SCL) ap… ▽ More

    Submitted 25 September, 2025; v1 submitted 21 September, 2025; originally announced September 2025.

  22. arXiv:2509.16839  [pdf, ps, other

    cs.AI

    Roundtable Policy: Improving Scientific Reasoning and Narratives through Confidence-Weighted Consensus of LLMs

    Authors: Yu Yao, Jiayi Dong, Ju Li, Yang Yang, Yilun Du

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities not only in language generation but also in advancing scientific discovery. A growing body of work has explored ways to improve their reasoning, from self-consistency and chain-of-thought to multi-agent debate. Inspired by the dynamics of scientific committees and the "Society of Mind," we introduce Roundtable Policy, a complem… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

    Comments: Equal contribution: Yu Yao and Jiayi Dong. Equal advising: Ju Li, Yang Yang, and Yilun Du. Affiliations: Massachusetts Institute of Technology (Yu Yao, Ju Li), University of California, Los Angeles (Jiayi Dong, Yang Yang), Harvard University (Yilun Du)

  23. arXiv:2509.16629  [pdf, ps, other

    cs.LG q-bio.QM

    Causality-Induced Positional Encoding for Transformer-Based Representation Learning of Non-Sequential Features

    Authors: Kaichen Xu, Yihang Du, Mianpeng Liu, Zimu Yu, Xiaobo Sun

    Abstract: Positional encoding is essential for supplementing transformer with positional information of tokens. Existing positional encoding methods demand predefined token/feature order, rendering them unsuitable for real-world data with non-sequential yet causally-related features. To address this limitation, we propose CAPE, a novel method that identifies underlying causal structure over non-sequential f… ▽ More

    Submitted 23 September, 2025; v1 submitted 20 September, 2025; originally announced September 2025.

    Comments: Accepted by NeurIPS 2025

  24. arXiv:2509.16246  [pdf, ps, other

    cs.PL cs.AR

    VerilogMonkey: Exploring Parallel Scaling for Automated Verilog Code Generation with LLMs

    Authors: Juxin Niu, Yuxin Du, Dan Niu, Xi Wang, Zhe Jiang, Nan Guan

    Abstract: We present VerilogMonkey, an empirical study of parallel scaling for the under-explored task of automated Verilog generation. Parallel scaling improves LLM performance by sampling many outputs in parallel. Across multiple benchmarks and mainstream LLMs, we find that scaling to hundreds of samples is cost-effective in both time and money and, even without any additional enhancements such as post-tr… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  25. arXiv:2509.15193  [pdf, ps, other

    quant-ph cs.AI

    TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE

    Authors: Yifeng Peng, Xinyi Li, Samuel Yen-Chi Chen, Kaining Zhang, Zhiding Liang, Ying Wang, Yuxuan Du

    Abstract: Variational quantum Eigensolver (VQE) is a leading candidate for harnessing quantum computers to advance quantum chemistry and materials simulations, yet its training efficiency deteriorates rapidly for large Hamiltonians. Two issues underlie this bottleneck: (i) the no-cloning theorem imposes a linear growth in circuit evaluations with the number of parameters per gradient step; and (ii) deeper c… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: Accepted by The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

  26. arXiv:2509.14278  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Beyond Data Privacy: New Privacy Risks for Large Language Models

    Authors: Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding

    Abstract: Large Language Models (LLMs) have achieved remarkable progress in natural language understanding, reasoning, and autonomous decision-making. However, these advancements have also come with significant privacy concerns. While significant research has focused on mitigating the data privacy risks of LLMs during various stages of model training, less attention has been paid to new threats emerging fro… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  27. arXiv:2509.14151  [pdf, ps, other

    cs.CV

    BEVUDA++: Geometric-aware Unsupervised Domain Adaptation for Multi-View 3D Object Detection

    Authors: Rongyu Zhang, Jiaming Liu, Xiaoqi Li, Xiaowei Chi, Dan Wang, Li Du, Yuan Du, Shanghang Zhang

    Abstract: Vision-centric Bird's Eye View (BEV) perception holds considerable promise for autonomous driving. Recent studies have prioritized efficiency or accuracy enhancements, yet the issue of domain shift has been overlooked, leading to substantial performance degradation upon transfer. We identify major domain gaps in real-world cross-domain scenarios and initiate the first effort to address the Domain… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: Accepted by IEEE TCSVT

  28. arXiv:2509.14002  [pdf, ps, other

    cs.NI

    RepCaM++: Exploring Transparent Visual Prompt With Inference-Time Re-Parameterization for Neural Video Delivery

    Authors: Rongyu Zhang, Xize Duan, Jiaming Liu, Li Du, Yuan Du, Dan Wang, Shanghang Zhang, Fangxin Wang

    Abstract: Recently, content-aware methods have been employed to reduce bandwidth and enhance the quality of Internet video delivery. These methods involve training distinct content-aware super-resolution (SR) models for each video chunk on the server, subsequently streaming the low-resolution (LR) video chunks with the SR models to the client. Prior research has incorporated additional partial parameters to… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  29. arXiv:2509.13651  [pdf, ps, other

    cs.LG

    Controllable Pareto Trade-off between Fairness and Accuracy

    Authors: Yongkang Du, Jieyu Zhao, Yijun Yang, Tianyi Zhou

    Abstract: The fairness-accuracy trade-off is a key challenge in NLP tasks. Current work focuses on finding a single "optimal" solution to balance the two objectives, which is limited considering the diverse solutions on the Pareto front. This work intends to provide controllable trade-offs according to the user's preference of the two objectives, which is defined as a reference vector. To achieve this goal,… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  30. arXiv:2509.13210  [pdf, ps, other

    cs.CV

    Vi-SAFE: A Spatial-Temporal Framework for Efficient Violence Detection in Public Surveillance

    Authors: Ligang Chang, Shengkai Xu, Liangchang Shen, Binhan Xu, Junqiao Wang, Tianyu Shi, Yanhui Du

    Abstract: Violence detection in public surveillance is critical for public safety. This study addresses challenges such as small-scale targets, complex environments, and real-time temporal analysis. We propose Vi-SAFE, a spatial-temporal framework that integrates an enhanced YOLOv8 with a Temporal Segment Network (TSN) for video surveillance. The YOLOv8 model is optimized with GhostNetV3 as a lightweight ba… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

    ACM Class: I.2.10; I.4.8

  31. arXiv:2509.10344  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography

    Authors: Yuexi Du, Lihui Chen, Nicha C. Dvornek

    Abstract: Mammography screening is an essential tool for early detection of breast cancer. The speed and accuracy of mammography interpretation have the potential to be improved with deep learning methods. However, the development of a foundation visual language model (VLM) is hindered by limited data and domain differences between natural and medical images. Existing mammography VLMs, adapted from natural… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: Accepted by MICCAI 2025

  32. arXiv:2509.09727  [pdf, ps, other

    cs.CL cs.CE

    A Role-Aware Multi-Agent Framework for Financial Education Question Answering with LLMs

    Authors: Andy Zhu, Yingjun Du

    Abstract: Question answering (QA) plays a central role in financial education, yet existing large language model (LLM) approaches often fail to capture the nuanced and specialized reasoning required for financial problem-solving. The financial domain demands multistep quantitative reasoning, familiarity with domain-specific terminology, and comprehension of real-world scenarios. We present a multi-agent fra… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

    Comments: 8 pages, 6 figures, Underreview

  33. arXiv:2509.09174  [pdf, ps, other

    cs.CL cs.AI cs.SD

    EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

    Authors: Yuhao Zhang, Yuhao Du, Zhanchen Dai, Xiangnan Ma, Kaiqi Kou, Benyou Wang, Haizhou Li

    Abstract: Speech-to-speech large language models (SLLMs) are attracting increasing attention. Derived from text-based large language models (LLMs), SLLMs often exhibit degradation in knowledge and reasoning capabilities. We hypothesize that this limitation arises because current training paradigms for SLLMs fail to bridge the acoustic-semantic gap in the feature representation space. To address this issue,… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

  34. arXiv:2509.09090  [pdf, ps, other

    cs.CV cs.AI

    SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models

    Authors: Hengyu Fang, Yijiang Liu, Yuan Du, Li Du, Huanrui Yang

    Abstract: Vision-Language-Action (VLA) models exhibit unprecedented capabilities for embodied intelligence. However, their extensive computational and memory costs hinder their practical deployment. Existing VLA compression and acceleration approaches conduct quantization or token pruning in an ad-hoc manner but fail to enable both for a holistic efficiency improvement due to an observed incompatibility. Th… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

    Comments: 12 pages, 9 figures

  35. arXiv:2509.08833  [pdf, ps, other

    cs.CY

    Position: The Pitfalls of Over-Alignment: Overly Caution Health-Related Responses From LLMs are Unethical and Dangerous

    Authors: Wenqi Marshall Guo, Yiyang Du, Heidi J. S. Tworek, Shan Du

    Abstract: Large Language Models (LLMs) are usually aligned with "human values/preferences" to prevent harmful output. Discussions around the alignment of Large Language Models (LLMs) generally focus on preventing harmful outputs. However, in this paper, we argue that in health-related queries, over-alignment-leading to overly cautious responses-can itself be harmful, especially for people with anxiety and o… ▽ More

    Submitted 27 August, 2025; originally announced September 2025.

  36. arXiv:2509.06807  [pdf, ps, other

    cs.CL

    MoGU V2: Toward a Higher Pareto Frontier Between Model Usability and Security

    Authors: Yanrui Du, Fenglei Fan, Sendong Zhao, Jiawei Cao, Ting Liu, Bing Qin

    Abstract: As Large Language Models (LLMs) increasingly permeate human life, their security has emerged as a critical concern, particularly their ability to maintain harmless responses to malicious instructions. Although extensive methods have improved LLMs' security, they often lead to conservative, rejection-oriented responses that compromise practical usability. This presents a key challenge: how to advan… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  37. arXiv:2509.06796  [pdf, ps, other

    cs.CR cs.LG

    Imitative Membership Inference Attack

    Authors: Yuntao Du, Yuetian Chen, Hanshen Xiao, Bruno Ribeiro, Ninghui Li

    Abstract: A Membership Inference Attack (MIA) assesses how much a target machine learning model reveals about its training data by determining whether specific query instances were part of the training set. State-of-the-art MIAs rely on training hundreds of shadow models that are independent of the target model, leading to significant computational overhead. In this paper, we introduce Imitative Membership… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

    Comments: Code is available at: https://github.com/zealscott/IMIA

  38. arXiv:2509.06795  [pdf, ps, other

    cs.CL

    Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint

    Authors: Yanrui Du, Fenglei Fan, Sendong Zhao, Jiawei Cao, Qika Lin, Kai He, Ting Liu, Bing Qin, Mengling Feng

    Abstract: Instruction Fine-Tuning (IFT) has been widely adopted as an effective post-training strategy to enhance various abilities of Large Language Models (LLMs). However, prior studies have shown that IFT can significantly compromise LLMs' safety, particularly their ability to refuse malicious instructions, raising significant concerns. Recent research into the internal mechanisms of LLMs has identified… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  39. PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

    Authors: Olivier Schipper, Yudi Zhang, Yali Du, Mykola Pechenizkiy, Meng Fang

    Abstract: LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. To address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft. It provides an extensible API, multi-round testing, and rul… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

    Comments: for the source code, see https://github.com/aialt/PillagerBench

    ACM Class: I.2.11; I.2.6; I.2.8

    Journal ref: 2025 IEEE Conference on Games (CoG), Lisbon, Portugal, 2025, pp. 1-15

  40. arXiv:2509.06119  [pdf, ps, other

    cs.RO eess.SY

    A Hybrid TDMA/CSMA Protocol for Time-Sensitive Traffic in Robot Applications

    Authors: Shiqi Xu, Lihao Zhang, Yuyang Du, Qun Yang, Soung Chang Liew

    Abstract: Recent progress in robotics has underscored the demand for real-time control in applications such as manufacturing, healthcare, and autonomous systems, where the timely delivery of mission-critical commands under heterogeneous robotic traffic is paramount for operational efficacy and safety. In these scenarios, mission-critical traffic follows a strict deadline-constrained communication pattern: c… ▽ More

    Submitted 27 September, 2025; v1 submitted 7 September, 2025; originally announced September 2025.

  41. arXiv:2509.05209  [pdf, ps, other

    cs.CL

    Hunyuan-MT Technical Report

    Authors: Mao Zheng, Zheng Li, Bingxin Qu, Mingyang Song, Yang Du, Mingrui Sun, Di Wang

    Abstract: In this report, we introduce Hunyuan-MT-7B, our first open-source multilingual translation model, which supports bidirectional translation across 33 major languages and places a special emphasis on translation between Mandarin and several ethnic minority languages as well as dialects. Furthermore, to serve and address diverse translation scenarios and enhance model performance at test time, we int… ▽ More

    Submitted 9 September, 2025; v1 submitted 5 September, 2025; originally announced September 2025.

  42. arXiv:2509.04923  [pdf, ps, other

    quant-ph cs.AI cs.LG

    Artificial intelligence for representing and characterizing quantum systems

    Authors: Yuxuan Du, Yan Zhu, Yuan-Hang Zhang, Min-Hsiu Hsieh, Patrick Rebentrost, Weibo Gao, Ya-Dong Wu, Jens Eisert, Giulio Chiribella, Dacheng Tao, Barry C. Sanders

    Abstract: Efficient characterization of large-scale quantum systems, especially those produced by quantum analog simulators and megaquop quantum computers, poses a central challenge in quantum science due to the exponential scaling of the Hilbert space with respect to system size. Recent advances in artificial intelligence (AI), with its aptitude for high-dimensional pattern recognition and function approxi… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: 32 pages. Comments are welcome

  43. arXiv:2509.02544  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.HC

    UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

    Authors: Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, Junting Lu, Longxiang Liu, Qinyu Luo, Shihao Liang, Shijue Huang, Wanjun Zhong, Yining Ye, Yujia Qin, Yuwen Xiong, Yuxin Song, Zhiyong Wu, Aoyan Li, Bo Li, Chen Dun, Chong Liu, Daoguang Zan, Fuxing Leng, Hanbin Wang, Hao Yu, Haobin Chen , et al. (87 additional authors not shown)

    Abstract: The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and… ▽ More

    Submitted 5 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

  44. arXiv:2509.01025  [pdf, ps, other

    cs.LG

    Any-Order Flexible Length Masked Diffusion

    Authors: Jaeyeon Kim, Lee Cheuk-Kit, Carles Domingo-Enrich, Yilun Du, Sham Kakade, Timothy Ngotiaoco, Sitan Chen, Michael Albergo

    Abstract: Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to fixed-length generations. To this end, we intr… ▽ More

    Submitted 7 September, 2025; v1 submitted 31 August, 2025; originally announced September 2025.

    Comments: Preprint

  45. arXiv:2508.21148  [pdf, ps, other

    cs.CL cs.AI

    A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

    Authors: Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su , et al. (78 additional authors not shown)

    Abstract: Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a un… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  46. arXiv:2508.20916  [pdf, ps, other

    cs.CL

    SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement

    Authors: Yuan Ge, Junxiang Zhang, Xiaoqian Liu, Bei Li, Xiangnan Ma, Chenglong Wang, Kaiyang Ye, Yangfan Du, Linfeng Zhang, Yuxin Huang, Tong Xiao, Zhengtao Yu, JingBo Zhu

    Abstract: Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose \texttt{SageLM}, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive S2S LLMs evaluation. First, unlike cascaded approaches that disregard acoustic… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  47. arXiv:2508.19996  [pdf, ps, other

    cs.CL

    ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning

    Authors: Yiming Du, Yifan Xiang, Bin Liang, Dahua Lin, Kam-Fai Wong, Fei Tan

    Abstract: Fine-tuning multi-turn dialogue systems requires high-quality supervision but often suffers from degraded performance when exposed to low-quality data. Supervision errors in early turns can propagate across subsequent turns, undermining coherence and response quality. Existing methods typically address data quality via static prefiltering, which decouples quality control from training and fails to… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  48. arXiv:2508.18993  [pdf, ps, other

    cs.SE cs.AI

    GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

    Authors: Ziyi Ni, Huacan Wang, Shuo Zhang, Shuo Lu, Ziyang He, Wang You, Zhenheng Tang, Yuntao Du, Bill Sun, Hongzhang Liu, Sen Hu, Ronghao Chen, Bo Li, Xin Li, Chen Hu, Binxing Jiao, Daxin Jiang, Pin Lyu

    Abstract: Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios. To bridge this gap, we introduce GitTaskBench, a benchmark designed to systematically assess this capability via 54 realistic tasks across 7 modalities and 7 d… ▽ More

    Submitted 14 September, 2025; v1 submitted 26 August, 2025; originally announced August 2025.

    Comments: Highly practical, Well-motivated, Actionable

  49. SWiFT: Soft-Mask Weight Fine-tuning for Bias Mitigation

    Authors: Junyu Yan, Feng Chen, Yuyang Xue, Yuning Du, Konstantinos Vilouras, Sotirios A. Tsaftaris, Steven McDonagh

    Abstract: Recent studies have shown that Machine Learning (ML) models can exhibit bias in real-world scenarios, posing significant challenges in ethically sensitive domains such as healthcare. Such bias can negatively affect model fairness, model generalization abilities and further risks amplifying social discrimination. There is a need to remove biases from trained models. Existing debiasing approaches of… ▽ More

    Submitted 4 September, 2025; v1 submitted 26 August, 2025; originally announced August 2025.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2025:015

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 3 (2025)

  50. arXiv:2508.18701  [pdf, ps, other

    cs.CL

    Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System

    Authors: Yanfan Du, Jun Zhang, Bin Wang, Jin Qiu, Lu Huang, Yuan Ge, Xiaoqian Liu, Tong Xiao, Jingbo Zhu

    Abstract: Recent advances in speech large language models (SLMs) have improved speech recognition and translation in general domains, but accurately generating domain-specific terms or neologisms remains challenging. To address this, we propose Attention2Probability: attention-driven terminology probability estimation for robust speech-to-text system, which is lightweight, flexible, and accurate. Attention2… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: 9 pages, 4 figures, 5 tables