Skip to main content

Showing 1–50 of 852 results for author: Wu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10113  [pdf, other

    cs.CL

    What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs

    Authors: Xinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto

    Abstract: In this paper, we introduce S-MedQA, an English medical question-answering (QA) dataset for benchmarking large language models in fine-grained clinical specialties. We use S-MedQA to check the applicability of a popular hypothesis related to knowledge injection in the knowledge-intense scenario of medical QA, and show that: 1) training on data from a speciality does not necessarily lead to best pe… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.08247  [pdf, ps, other

    eess.IV cs.CV

    Skeleton-Guided Diffusion Model for Accurate Foot X-ray Synthesis in Hallux Valgus Diagnosis

    Authors: Midi Wan, Pengfei Li, Yizhuo Liang, Di Wu, Yushan Pan, Guangzhen Zhu, Hao Wang

    Abstract: Medical image synthesis plays a crucial role in providing anatomically accurate images for diagnosis and treatment. Hallux valgus, which affects approximately 19% of the global population, requires frequent weight-bearing X-rays for assessment, placing additional strain on both patients and healthcare providers. Existing X-ray models often struggle to balance image fidelity, skeletal consistency,… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  3. arXiv:2505.07172  [pdf, other

    cs.CV

    Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning

    Authors: Zexian Yang, Dian Li, Dayan Wu, Gang Liu, Weiping Wang

    Abstract: Despite significant advancements in multimodal reasoning tasks, existing Large Vision-Language Models (LVLMs) are prone to producing visually ungrounded responses when interpreting associated images. In contrast, when humans embark on learning new knowledge, they often rely on a set of fundamental pre-study principles: reviewing outlines to grasp core concepts, summarizing key points to guide thei… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  4. arXiv:2505.05736  [pdf

    q-bio.QM cs.CL cs.CV cs.LG

    Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications

    Authors: Da Wu, Zhanliang Wang, Quan Nguyen, Zhuoran Xu, Kai Wang

    Abstract: The scarcity of high-quality multimodal biomedical data limits the ability to effectively fine-tune pretrained Large Language Models (LLMs) for specialized biomedical tasks. To address this challenge, we introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal large decoder models with domain-specific decision patterns from multimodal biomedical data through pref… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: First Draft

  5. arXiv:2505.04899  [pdf, other

    cs.CV

    OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging

    Authors: Sifan Song, Siyeop Yoon, Pengfei Jin, Sekeun Kim, Matthew Tivnan, Yujin Oh, Runqi Meng, Ling Chen, Zhiliang Lyu, Dufan Wu, Ning Guo, Xiang Li, Quanzheng Li

    Abstract: Recent advances in representation learning often rely on holistic, black-box embeddings that entangle multiple semantic components, limiting interpretability and generalization. These issues are especially critical in medical imaging. To address these limitations, we propose an Organ-Wise Tokenization (OWT) framework with a Token Group-based Reconstruction (TGR) training paradigm. Unlike conventio… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  6. arXiv:2505.04421  [pdf, other

    cs.IR

    LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

    Authors: Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, Di Wu

    Abstract: Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Rec… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  7. arXiv:2505.03390  [pdf, other

    cs.LG

    Concept Factorization via Self-Representation and Adaptive Graph Structure Learning

    Authors: Zhengqin Yang, Di Wu, Jia Chen, Xin Luo

    Abstract: Concept Factorization (CF) models have attracted widespread attention due to their excellent performance in data clustering. In recent years, many variant models based on CF have achieved great success in clustering by taking into account the internal geometric manifold structure of the dataset and using graph regularization techniques. However, their clustering performance depends greatly on the… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  8. arXiv:2504.21008  [pdf, other

    cs.CR cs.AI

    Research on CNN-BiLSTM Network Traffic Anomaly Detection Model Based on MindSpore

    Authors: Qiuyan Xiang, Shuang Wu, Dongze Wu, Yuxin Liu, Zhenkai Qin

    Abstract: With the widespread adoption of the Internet of Things (IoT) and Industrial IoT (IIoT) technologies, network architectures have become increasingly complex, and the volume of traffic has grown substantially. This evolution poses significant challenges to traditional security mechanisms, particularly in detecting high-frequency, diverse, and highly covert network attacks. To address these challenge… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  9. arXiv:2504.19983  [pdf, other

    cs.LG stat.ML

    Emergence and scaling laws in SGD learning of shallow neural networks

    Authors: Yunwei Ren, Eshaan Nichani, Denny Wu, Jason D. Lee

    Abstract: We study the complexity of online stochastic gradient descent (SGD) for learning a two-layer neural network with $P$ neurons on isotropic Gaussian data: $f_*(\boldsymbol{x}) = \sum_{p=1}^P a_p\cdot σ(\langle\boldsymbol{x},\boldsymbol{v}_p^*\rangle)$, $\boldsymbol{x} \sim \mathcal{N}(0,\boldsymbol{I}_d)$, where the activation $σ:\mathbb{R}\to\mathbb{R}$ is an even function with information exponent… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 100 pages

  10. arXiv:2504.19044  [pdf, other

    cs.CL

    Calibrating Translation Decoding with Quality Estimation on LLMs

    Authors: Di Wu, Yibin Lei, Christof Monz

    Abstract: Neural machine translation (NMT) systems typically employ maximum a posteriori (MAP) decoding to select the highest-scoring translation from the distribution mass. However, recent evidence highlights the inadequacy of MAP decoding, often resulting in low-quality or even pathological hypotheses -- the decoding objective is not aligned with real-world translation quality. This paper proposes calibra… ▽ More

    Submitted 10 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

  11. arXiv:2504.18598  [pdf, other

    cs.CR cs.AI

    BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts

    Authors: Qingyue Wang, Qi Pang, Xixun Lin, Shuai Wang, Daoyuan Wu

    Abstract: Mixture-of-Experts (MoE) have emerged as a powerful architecture for large language models (LLMs), enabling efficient scaling of model capacity while maintaining manageable computational costs. The key advantage lies in their ability to route different tokens to different ``expert'' networks within the model, enabling specialization and efficient handling of diverse input. However, the vulnerabili… ▽ More

    Submitted 28 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  12. arXiv:2504.18335  [pdf, ps, other

    cs.IT

    Rack-Aware Minimum Storage Partially Cooperative Regenerating Codes with Small Sub-Packetization

    Authors: Hengming Zhao, Dianhua Wu, Minquan Cheng

    Abstract: In the rack-aware model, there are $\bar{n}$ racks each of which has $u$ nodes with the same storage capacity. Assume that there are $h$ failed nodes uniformly distributed in $\bar{h}$ host racks ( defined as racks containing failed nodes), each rack containing $h/\bar{h}$ failed nodes where $h$ is divisible by $\bar{h}$. Then together with its internal helper nodes, each host rack downloads recov… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  13. arXiv:2504.18064  [pdf, other

    cs.RO

    AllTact Fin Ray: A Compliant Robot Gripper with Omni-Directional Tactile Sensing

    Authors: Siwei Liang, Yixuan Guan, Jing Xu, Hongyu Qian, Xiangjun Zhang, Dan Wu, Wenbo Ding, Rui Chen

    Abstract: Tactile sensing plays a crucial role in robot grasping and manipulation by providing essential contact information between the robot and the environment. In this paper, we present AllTact Fin Ray, a novel compliant gripper design with omni-directional and local tactile sensing capabilities. The finger body is unibody-casted using transparent elastic silicone, and a camera positioned at the base of… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  14. arXiv:2504.17996  [pdf, other

    cs.CV

    Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning

    Authors: Yuanbing Ouyang, Yizhuo Liang, Qingpeng Li, Xinfei Guo, Yiming Luo, Di Wu, Hao Wang, Yushan Pan

    Abstract: Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces 'LVTP', a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice cluste… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  15. arXiv:2504.17678  [pdf, other

    cs.CY

    MindFlow: A Network Traffic Anomaly Detection Model Based on MindSpore

    Authors: Qiuyan Xiang, Shuang Wu, Dongze Wu, Yuxin Liu, Zhenkai Qin

    Abstract: With the wide application of IoT and industrial IoT technologies, the network structure is becoming more and more complex, and the traffic scale is growing rapidly, which makes the traditional security protection mechanism face serious challenges in dealing with high-frequency, diversified, and stealthy cyber-attacks. To address this problem, this study proposes MindFlow, a multi-dimensional dynam… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  16. arXiv:2504.17574  [pdf, other

    cs.CL cs.CY

    RAGAT-Mind: A Multi-Granular Modeling Approach for Rumor Detection Based on MindSpore

    Authors: Zhenkai Qin, Guifang Yang, Dongze Wu

    Abstract: As false information continues to proliferate across social media platforms, effective rumor detection has emerged as a pressing challenge in natural language processing. This paper proposes RAGAT-Mind, a multi-granular modeling approach for Chinese rumor detection, built upon the MindSpore deep learning framework. The model integrates TextCNN for local semantic extraction, bidirectional GRU for s… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  17. arXiv:2504.17087  [pdf, other

    cs.AI

    Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments

    Authors: Yuran Li, Jama Hussein Mohamud, Chongren Sun, Di Wu, Benoit Boulet

    Abstract: Large language models (LLMs) are being widely applied across various fields, but as tasks become more complex, evaluating their responses is increasingly challenging. Compared to human evaluators, the use of LLMs to support performance evaluation offers a more efficient alternative. However, most studies focus mainly on aligning LLMs' judgments with human preferences, overlooking the existence of… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures, 6 tables

  18. arXiv:2504.16546  [pdf, other

    cs.CY cs.HC

    Tinkering Against Scaling

    Authors: Bolun Zhang, Yang Shen, Linzhuo Li, Yu Ji, Di Wu, Tongyu Wu, Lianghao Dai

    Abstract: The ascent of scaling in artificial intelligence research has revolutionized the field over the past decade, yet it presents significant challenges for academic researchers, particularly in computational social science and critical algorithm studies. The dominance of large language models, characterized by their extensive parameters and costly training processes, creates a disparity where only ind… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 43 pages, 4 figures

  19. arXiv:2504.15987  [pdf, other

    cs.CL cs.CY

    Few-shot Hate Speech Detection Based on the MindSpore Framework

    Authors: Zhenkai Qin, Dongze Wu, Yuxin Liu, Guifang Yang

    Abstract: The proliferation of hate speech on social media poses a significant threat to online communities, requiring effective detection systems. While deep learning models have shown promise, their performance often deteriorates in few-shot or low-resource settings due to reliance on large annotated corpora. To address this, we propose MS-FSLHate, a prompt-enhanced neural framework for few-shot hate spee… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  20. arXiv:2504.14960  [pdf, other

    cs.LG cs.DC

    MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core

    Authors: Dennis Liu, Zijie Yan, Xin Yao, Tong Liu, Vijay Korthikanti, Evan Wu, Shiqing Fan, Gao Deng, Hongxiao Bai, Jianbin Chang, Ashwath Aithal, Michael Andersch, Mohammad Shoeybi, Jiajie Yao, Chandler Zhou, David Wu, Xipeng Li, June Yang

    Abstract: Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs. However, efficient training of large-scale MoE models across thousands of GPUs presents significant challenges due to limitations in existing parallelism strategies. We introduce an end-to-end train… ▽ More

    Submitted 23 April, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  21. arXiv:2504.13576  [pdf, other

    cs.LG

    MSTIM: A MindSpore-Based Model for Traffic Flow Prediction

    Authors: Weiqi Qin, Yuxin Liu, Dongze Wu, Zhenkai Qin, Qining Luo

    Abstract: Aiming at the problems of low accuracy and large error fluctuation of traditional traffic flow predictionmodels when dealing with multi-scale temporal features and dynamic change patterns. this paperproposes a multi-scale time series information modelling model MSTIM based on the Mindspore framework, which integrates long and short-term memory networks (LSTMs), convolutional neural networks (CNN),… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  22. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  23. arXiv:2504.13110  [pdf, other

    stat.ML cs.LG

    Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

    Authors: Margalit Glasgow, Denny Wu, Joan Bruna

    Abstract: We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessia… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 70 pages

  24. arXiv:2504.10030  [pdf, other

    cs.RO cs.AI

    EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control

    Authors: Hanwen Wan, Yifei Chen, Zeyu Wei, Dongrui Li, Zexin Lin, Donghao Wu, Jiu Cheng, Yuxiang Zhang, Xiaoqiang Ji

    Abstract: This paper introduces EmbodiedAgent, a hierarchical framework for heterogeneous multi-robot control. EmbodiedAgent addresses critical limitations of hallucination in impractical tasks. Our approach integrates a next-action prediction paradigm with a structured memory system to decompose tasks into executable robot skills while dynamically validating actions against environmental constraints. We pr… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  25. arXiv:2504.09997  [pdf, other

    cs.RO cs.AI

    GenTe: Generative Real-world Terrains for General Legged Robot Locomotion Control

    Authors: Hanwen Wan, Mengkang Li, Donghao Wu, Yebin Zhong, Yixuan Deng, Zhenglong Sun, Xiaoqiang Ji

    Abstract: Developing bipedal robots capable of traversing diverse real-world terrains presents a fundamental robotics challenge, as existing methods using predefined height maps and static environments fail to address the complexity of unstructured landscapes. To bridge this gap, we propose GenTe, a framework for generating physically realistic and adaptable terrains to train generalizable locomotion polici… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  26. arXiv:2504.09348   

    stat.ME cs.LG eess.SP

    Graph-Based Prediction Models for Data Debiasing

    Authors: Dongze Wu, Hanyang Jiang, Yao Xie

    Abstract: Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in critical applications such as healthcare and public safety. In this work, we introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reportin… ▽ More

    Submitted 18 April, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

    Comments: We submitted this arXiv version by mistake. We have decided to update the original submission (arXiv:2307.07898) instead of submitting a separate article

  27. arXiv:2504.09221  [pdf, ps, other

    cs.HC cs.LG

    CMCRD: Cross-Modal Contrastive Representation Distillation for Emotion Recognition

    Authors: Siyuan Kan, Huanyu Wu, Zhenyao Cui, Fan Huang, Xiaolong Xu, Dongrui Wu

    Abstract: Emotion recognition is an important component of affective computing, and also human-machine interaction. Unimodal emotion recognition is convenient, but the accuracy may not be high enough; on the contrary, multi-modal emotion recognition may be more accurate, but it also increases the complexity and cost of the data collection system. This paper considers cross-modal emotion recognition, i.e., u… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  28. arXiv:2504.09213  [pdf, ps, other

    cs.HC cs.LG cs.NE

    Spiking Neural Network for Intra-cortical Brain Signal Decoding

    Authors: Song Yang, Haotian Fu, Herui Zhang, Peng Zhang, Wei Li, Dongrui Wu

    Abstract: Decoding brain signals accurately and efficiently is crucial for intra-cortical brain-computer interfaces. Traditional decoding approaches based on neural activity vector features suffer from low accuracy, whereas deep learning based approaches have high computational cost. To improve both the decoding accuracy and efficiency, this paper proposes a spiking neural network (SNN) for effective and en… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  29. arXiv:2504.08761  [pdf, other

    cs.IR

    UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation

    Authors: Yuxuan Chen, Dewen Guo, Sen Mei, Xinze Li, Hao Chen, Yishan Li, Yixuan Wang, Chaoyue Tang, Ruobing Wang, Dingjun Wu, Yukun Yan, Zhenghao Liu, Shi Yu, Zhiyuan Liu, Maosong Sun

    Abstract: Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge. To facilitate researchers in deploying RAG systems, various RAG toolkits have been introduced. However, many existing RAG toolkits lack support for knowledge adaptation tailored to specific application scenarios. To address this limitati… ▽ More

    Submitted 30 March, 2025; originally announced April 2025.

  30. arXiv:2504.06792  [pdf, other

    cs.CL cs.LG

    Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations

    Authors: Zican Dong, Han Peng, Peiyu Liu, Wayne Xin Zhao, Dong Wu, Feng Xiao, Zhifeng Wang

    Abstract: Mixture-of-Experts (MoE) models achieve a favorable trade-off between performance and inference efficiency by activating only a subset of experts. However, the memory overhead of storing all experts remains a major limitation, especially in large-scale MoE models such as DeepSeek-R1 (671B). In this study, we investigate domain specialization and expert redundancy in large-scale MoE models and unco… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  31. arXiv:2504.05329  [pdf, other

    cs.RO

    Ultrasound-Guided Robotic Blood Drawing and In Vivo Studies on Submillimetre Vessels of Rats

    Authors: Shuaiqi Jing, Tianliang Yao, Ke Zhang, Di Wu, Qiulin Wang, Zixi Chen, Ke Chen, Peng Qi

    Abstract: Billions of vascular access procedures are performed annually worldwide, serving as a crucial first step in various clinical diagnostic and therapeutic procedures. For pediatric or elderly individuals, whose vessels are small in size (typically 2 to 3 mm in diameter for adults and less than 1 mm in children), vascular access can be highly challenging. This study presents an image-guided robotic sy… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures. This paper has been accepted by IEEE ICRA 2025

  32. arXiv:2504.02317  [pdf, other

    cs.LG cs.AI

    Temporal Gaussian Copula For Clinical Multivariate Time Series Data Imputation

    Authors: Ye Su, Hezhe Qiao, Di Wu, Yuwen Chen, Lin Chen

    Abstract: The imputation of the Multivariate time series (MTS) is particularly challenging since the MTS typically contains irregular patterns of missing values due to various factors such as instrument failures, interference from irrelevant data, and privacy regulations. Existing statistical methods and deep learning methods have shown promising results in time series imputation. In this paper, we propose… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Accepted in BIBM2024

  33. arXiv:2504.01990  [pdf, other

    cs.AI

    Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

    Authors: Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia , et al. (22 additional authors not shown)

    Abstract: The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  34. arXiv:2504.01597  [pdf, other

    eess.IV cs.CV

    A topology-preserving three-stage framework for fully-connected coronary artery extraction

    Authors: Yuehui Qiu, Dandan Shan, Yining Wang, Pei Dong, Dijia Wu, Xinnian Yang, Qingqi Hong, Dinggang Shen

    Abstract: Coronary artery extraction is a crucial prerequisite for computer-aided diagnosis of coronary artery disease. Accurately extracting the complete coronary tree remains challenging due to several factors, including presence of thin distal vessels, tortuous topological structures, and insufficient contrast. These issues often result in over-segmentation and under-segmentation in current segmentation… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  35. arXiv:2504.01018  [pdf, other

    cs.CL

    Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

    Authors: Di Wu, Jia-Chen Gu, Kai-Wei Chang, Nanyun Peng

    Abstract: Selective retrieval improves retrieval-augmented generation (RAG) by reducing distractions from low-quality retrievals and improving efficiency. However, existing approaches under-utilize the inherent knowledge of large language models (LLMs), leading to suboptimal retrieval decisions and degraded generation performance. To bridge this gap, we propose Self-Routing RAG (SR-RAG), a novel framework t… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Work in Progress

  36. arXiv:2504.00726  [pdf, other

    cs.LG cs.DC

    EMO: Edge Model Overlays to Scale Model Size in Federated Learning

    Authors: Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese

    Abstract: Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlen… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Poster accepted at IEEE ICDCS 2025

  37. arXiv:2504.00408  [pdf, other

    cs.CY cs.AI cs.HC

    From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

    Authors: Ruben Weijers, Denton Wu, Hannah Betts, Tamara Jacod, Yuxiang Guan, Vidya Sujaya, Kushal Dev, Toshali Goel, William Delooze, Reihaneh Rabbany, Ying Wu, Jean-François Godbout, Kellin Pelrine

    Abstract: Generative AI has the potential to transform personalization and accessibility of education. However, it raises serious concerns about accuracy and helping students become independent critical thinkers. In this study, we designed a helpful AI "Peer" to help students correct fundamental physics misconceptions related to Newtonian mechanic concepts. In contrast to approaches that seek near-perfect a… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  38. arXiv:2503.24245  [pdf, other

    cs.CL

    Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation

    Authors: Dun Yuan, Hao Zhou, Di Wu, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

    Abstract: Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks. However, LLMs are still facing challenges when applied to domain-specific areas like telecommunications, which demands specialized expertise and adaptability to evolving standards. This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generati… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: This work has been accepted to ICC 2025 IEEE International Conference on Communications. copyright 2025 IEEE

  39. arXiv:2503.23537  [pdf, other

    cs.LG

    Redundant feature screening method for human activity recognition based on attention purification mechanism

    Authors: Hanyu Liu, Xiaoyang Li, Yixuan Jiang, Haotian Tang, Dongchen Wu, Yameng Guo

    Abstract: In the field of sensor-based Human Activity Recognition (HAR), deep neural networks provide advanced technical support. Many studies have proven that recognition accuracy can be improved by increasing the depth or width of the network. However, for wearable devices, the balance between network performance and resource consumption is crucial. With minimum resource consumption as the basic principle… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 12 pages,7 figures

  40. arXiv:2503.17932  [pdf, other

    cs.CL cs.AI cs.CR

    STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models

    Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang

    Abstract: Large Language Models (LLMs) have become increasingly vulnerable to jailbreak attacks that circumvent their safety mechanisms. While existing defense methods either suffer from adaptive attacks or require computationally expensive auxiliary models, we present STShield, a lightweight framework for real-time jailbroken judgement. STShield introduces a novel single-token sentinel mechanism that appen… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: 11 pages

  41. arXiv:2503.13241  [pdf, other

    cs.CV eess.IV

    Sampling Innovation-Based Adaptive Compressive Sensing

    Authors: Zhifu Tian, Tao Hu, Chaoyang Niu, Di Wu, Shu Wang

    Abstract: Scene-aware Adaptive Compressive Sensing (ACS) has attracted significant interest due to its promising capability for efficient and high-fidelity acquisition of scene images. ACS typically prescribes adaptive sampling allocation (ASA) based on previous samples in the absence of ground truth. However, when confronting unknown scenes, existing ACS methods often lack accurate judgment and robust feed… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: CVPR2025 accepted

  42. arXiv:2503.12389  [pdf, other

    cs.AI

    FedGAI: Federated Style Learning with Cloud-Edge Collaboration for Generative AI in Fashion Design

    Authors: Mingzhu Wu, Jianan Jiang, Xinglin Li, Hanhui Deng, Di Wu

    Abstract: Collaboration can amalgamate diverse ideas, styles, and visual elements, fostering creativity and innovation among different designers. In collaborative design, sketches play a pivotal role as a means of expressing design creativity. However, designers often tend to not openly share these meticulously crafted sketches. This phenomenon of data island in the design area hinders its digital transform… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  43. arXiv:2503.12286  [pdf

    cs.CL cs.AI q-bio.GN q-bio.QM

    Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes

    Authors: Da Wu, Zhanliang Wang, Quan Nguyen, Kai Wang

    Abstract: Background: Several studies show that large language models (LLMs) struggle with phenotype-driven gene prioritization for rare diseases. These studies typically use Human Phenotype Ontology (HPO) terms to prompt foundation models like GPT and LLaMA to predict candidate genes. However, in real-world settings, foundation models are not optimized for domain-specific tasks like clinical diagnosis, yet… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: 31 pages, 3 figures

  44. arXiv:2503.11496  [pdf, other

    cs.CV

    Cognitive Disentanglement for Referring Multi-Object Tracking

    Authors: Shaofeng Liang, Runwei Guan, Wangwang Lian, Daizong Liu, Xiaolou Sun, Dongming Wu, Yutao Yue, Weiping Ding, Hui Xiong

    Abstract: As a significant application of multi-source information fusion in intelligent transportation perception systems, Referring Multi-Object Tracking (RMOT) involves localizing and tracking specific objects in video sequences based on language references. However, existing RMOT approaches often treat language descriptions as holistic embeddings and struggle to effectively integrate the rich semantic i… ▽ More

    Submitted 15 April, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: 26 pages, 11 figures

  45. arXiv:2503.11300  [pdf, other

    eess.SY cs.RO

    Six-DoF Stewart Platform Motion Simulator Control using Switchable Model Predictive Control

    Authors: Jiangwei Zhao, Zhengjia Xu, Dongsu Wu, Yingrui Cao, Jinpeng Xie

    Abstract: Due to excellent mechanism characteristics of high rigidity, maneuverability and strength-to-weight ratio, 6 Degree-of-Freedom (DoF) Stewart structure is widely adopted to construct flight simulator platforms for replicating motion feelings during training pilots. Unlike conventional serial link manipulator based mechanisms, Upset Prevention and Recovery Training (UPRT) in complex flight status is… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  46. arXiv:2503.11272  [pdf, other

    stat.ML cs.LG

    When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

    Authors: Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, Murat A. Erdogdu

    Abstract: Theoretical efforts to prove advantages of Transformers in comparison with classical architectures such as feedforward and recurrent neural networks have mostly focused on representational power. In this work, we take an alternative perspective and prove that even with infinite compute, feedforward and recurrent networks may suffer from larger sample complexity compared to Transformers, as the lat… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 43 pages, 2 figures

  47. arXiv:2503.11074  [pdf, other

    cs.AI cs.CL

    Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities

    Authors: Xueyang Zhou, Guiyao Tie, Guowen Zhang, Weidong Wang, Zhigang Zuo, Di Wu, Duanfeng Chu, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: The rise of Large Reasoning Models (LRMs) signifies a paradigm shift toward advanced computational reasoning. Yet, this progress disrupts traditional agent frameworks, traditionally anchored by execution-oriented Large Language Models (LLMs). To explore this transformation, we propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving, assessed with th… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 71 pages, 5 figures, 6 tables

  48. arXiv:2503.10635  [pdf, other

    cs.CV cs.AI cs.LG

    A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

    Authors: Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu, Jiacheng Cui, Zhiqiang Shen

    Abstract: Despite promising performance on open-source large vision-language models (LVLMs), transfer-based targeted attacks often fail against black-box commercial LVLMs. Analyzing failed adversarial perturbations reveals that the learned perturbations typically originate from a uniform distribution and lack clear semantic details, resulting in unintended responses. This critical absence of semantic inform… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Code at: https://github.com/VILA-Lab/M-Attack

  49. arXiv:2503.08726  [pdf, other

    cs.LG cs.AI eess.SP

    SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework

    Authors: Yubo Peng, Luping Xiang, Kun Yang, Feibo Jiang, Kezhi Wang, Dapeng Oliver Wu

    Abstract: Traditional single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency in bandwidth-constrained environments. Additionally, single-task-oriented sensing systems fail to address users' diverse demands. To overcome these challenges, we propose a semantic-driven integrated multimodal sensing and communication (SI… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  50. arXiv:2503.07877  [pdf, other

    stat.ML cs.IT cs.LG

    Cost-Aware Optimal Pairwise Pure Exploration

    Authors: Di Wu, Chengshuai Shi, Ruida Zhou, Cong Shen

    Abstract: Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure exploration, with a focus on identifying the pairwise relationships between targeted arm pairs. Moreover… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: AISTATS 2025