Skip to main content

Showing 1–50 of 4,025 results for author: Li, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09974  [pdf, other

    cs.CR cs.AI

    Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data

    Authors: Adel ElZemity, Budi Arief, Shujun Li

    Abstract: The integration of large language models (LLMs) into cyber security applications presents significant opportunities, such as enhancing threat analysis and malware detection, but can also introduce critical risks and safety concerns, including personal data leakage and automated generation of new malware. We present a systematic evaluation of safety risks in fine-tuned LLMs for cyber security appli… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09919  [pdf

    cs.RO eess.SY

    Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots

    Authors: Ziyang Zhou, Yogesh Phalak, Vishrut Deshpande, Ian Walker, Suyi Li

    Abstract: Deployable structures inspired by origami offer lightweight, compact, and reconfigurable solutions for robotic and architectural applications. We present a geometric and mechanical framework for Yoshimura-Ori modules that supports a diverse set of metastable states, including newly identified asymmetric "pop-out" and "hyperfolded" configurations. These states are governed by three parameters -- ti… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.09521  [pdf, ps, other

    eess.IV cs.CV

    Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net

    Authors: Dongyi He, Shiyang Li, Bin Jiang, He Yan

    Abstract: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain CNNs that fail… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  4. arXiv:2505.09498  [pdf, other

    cs.CV cs.AI

    Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

    Authors: Bo Zhang, Shuo Li, Runhe Tian, Yang Yang, Jixin Tang, Jinhao Zhou, Lin Ma

    Abstract: In this paper, we introduce Flash-VL 2B, a novel approach to optimizing Vision-Language Models (VLMs) for real-time applications, targeting ultra-low latency and high throughput without sacrificing accuracy. Leveraging advanced architectural enhancements and efficient computational strategies, Flash-VL 2B is designed to maximize throughput by reducing processing time while maintaining competitive… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 18 pages, 7 figures

  5. FACTors: A New Dataset for Studying the Fact-checking Ecosystem

    Authors: Enes Altuncu, Can Başkent, Sanjay Bhattacherjee, Shujun Li, Dwaipayan Roy

    Abstract: Our fight against false information is spearheaded by fact-checkers. They investigate the veracity of claims and document their findings as fact-checking reports. With the rapid increase in the amount of false information circulating online, the use of automation in fact-checking processes aims to strengthen this ecosystem by enhancing scalability. Datasets containing fact-checked claims play a ke… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: Accepted for the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)

  6. arXiv:2505.09295  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Toward Fair Federated Learning under Demographic Disparities and Data Imbalance

    Authors: Qiming Wu, Siqi Li, Doudou Zhou, Nan Liu

    Abstract: Ensuring fairness is critical when applying artificial intelligence to high-stakes domains such as healthcare, where predictive models trained on imbalanced and demographically skewed data risk exacerbating existing disparities. Federated learning (FL) enables privacy-preserving collaboration across institutions, but remains vulnerable to both algorithmic bias and subgroup imbalance - particularly… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2505.08783  [pdf, ps, other

    cs.LG cs.AI cs.CL math.NA

    CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

    Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar

    Abstract: Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement and are computationally expensive, while neural-network-based solvers require large training datasets and often lack interpretability. In this work, we frame PDE solving as a code generation task and in… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  8. arXiv:2505.08198  [pdf, ps, other

    stat.ML cs.LG

    SIM-Shapley: A Stable and Computationally Efficient Approach to Shapley Value Approximation

    Authors: Wangxuan Fan, Siqi Li, Doudou Zhou, Yohei Okada, Chuan Hong, Molei Liu, Nan Liu

    Abstract: Explainable artificial intelligence (XAI) is essential for trustworthy machine learning (ML), particularly in high-stakes domains such as healthcare and finance. Shapley value (SV) methods provide a principled framework for feature attribution in complex models but incur high computational costs, limiting their scalability in high-dimensional settings. We propose Stochastic Iterative Momentum for… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 21 pages, 6 figures, 5 tables

  9. arXiv:2505.08163  [pdf, ps, other

    cs.AI cs.CV

    Decoding Neighborhood Environments with Large Language Models

    Authors: Andrew Cart, Shaohu Zhang, Melanie Escue, Xugui Zhou, Haitao Zhao, Prashanth BusiReddyGari, Beiyu Lin, Shuang Li

    Abstract: Neighborhood environments include physical and environmental conditions such as housing quality, roads, and sidewalks, which significantly influence human health and well-being. Traditional methods for assessing these environments, including field surveys and geographic information systems (GIS), are resource-intensive and challenging to evaluate neighborhood environments at scale. Although machin… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 8 pages

  10. arXiv:2505.07968  [pdf, other

    cs.CL

    Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

    Authors: Weiyi Wu, Xinwen Xu, Chongyang Gao, Xingjian Diao, Siting Li, Lucas A. Salas, Jiang Gui

    Abstract: Large Language Models (LLMs) have great potential in the field of health care, yet they face great challenges in adapting to rapidly evolving medical knowledge. This can lead to outdated or contradictory treatment suggestions. This study investigated how LLMs respond to evolving clinical guidelines, focusing on concept drift and internal inconsistencies. We developed the DriftMedQA benchmark to si… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  11. arXiv:2505.07863  [pdf, ps, other

    cs.CL

    QoSBERT: An Uncertainty-Aware Approach based on Pre-trained Language Models for Service Quality Prediction

    Authors: Ziliang Wang, Xiaohong Zhang, Ze Shi Li, Meng Yan

    Abstract: Accurate prediction of Quality of Service (QoS) metrics is fundamental for selecting and managing cloud based services. Traditional QoS models rely on manual feature engineering and yield only point estimates, offering no insight into the confidence of their predictions. In this paper, we propose QoSBERT, the first framework that reformulates QoS prediction as a semantic regression task based on p… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  12. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  13. arXiv:2505.07539  [pdf, ps, other

    cs.CV

    GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

    Authors: Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, Yiyi Liao

    Abstract: Immersive video offers a 6-Dof-free viewing experience, potentially playing a key role in future video technology. Recently, 4D Gaussian Splatting has gained attention as an effective approach for immersive video due to its high rendering efficiency and quality, though maintaining quality with manageable storage remains challenging. To address this, we introduce GIFStream, a novel 4D Gaussian repr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures

  14. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  15. arXiv:2505.06918  [pdf, other

    eess.IV cs.CV cs.LG

    Uni-AIMS: AI-Powered Microscopy Image Analysis

    Authors: Yanhui Hong, Nan Wang, Zhiyi Xia, Haoyi Tao, Xi Fang, Yiming Li, Jiankun Wang, Peng Jin, Xiaochen Cai, Shengyu Li, Ziqi Chen, Zezhong Zhang, Guolin Ke, Linfeng Zhang

    Abstract: This paper presents a systematic solution for the intelligent recognition and automatic analysis of microscopy images. We developed a data engine that generates high-quality annotated datasets through a combination of the collection of diverse microscopy images from experiments, synthetic data generation and a human-in-the-loop annotation process. To address the unique challenges of microscopy ima… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  16. arXiv:2505.06897  [pdf, other

    cs.AI

    Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

    Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong

    Abstract: The ultimate goal of artificial intelligence (AI) is to achieve Artificial General Intelligence (AGI). Embodied Artificial Intelligence (EAI), which involves intelligent systems with physical presence and real-time interaction with the environment, has emerged as a key research direction in pursuit of AGI. While advancements in deep learning, reinforcement learning, large-scale language models, an… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 19pages,7 figures,3 tables

  17. arXiv:2505.06843  [pdf, ps, other

    cs.LG cs.CL

    Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety

    Authors: Zihan Guan, Mengxuan Hu, Ronghang Zhu, Sheng Li, Anil Vullikanti

    Abstract: Recent studies have uncovered a troubling vulnerability in the fine-tuning stage of large language models (LLMs): even fine-tuning on entirely benign datasets can lead to a significant increase in the harmfulness of LLM outputs. Building on this finding, our red teaming study takes this threat one step further by developing a more effective attack. Specifically, we analyze and identify samples wit… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 26 pages, 13 figures

  18. arXiv:2505.06678  [pdf, other

    cs.NI eess.SP

    Distributionally Robust Contract Theory for Edge AIGC Services in Teleoperation

    Authors: Zijun Zhan, Yaxian Dong, Daniel Mawunyo Doe, Yuqing Hu, Shuai Li, Shaohua Cao, Lei Fan, Zhu Han

    Abstract: Advanced AI-Generated Content (AIGC) technologies have injected new impetus into teleoperation, further enhancing its security and efficiency. Edge AIGC networks have been introduced to meet the stringent low-latency requirements of teleoperation. However, the inherent uncertainty of AIGC service quality and the need to incentivize AIGC service providers (ASPs) make the design of a robust incentiv… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  19. arXiv:2505.06628  [pdf, ps, other

    cs.RO

    ACORN: Adaptive Contrastive Optimization for Safe and Robust Fine-Grained Robotic Manipulation

    Authors: Zhongquan Zhou, Shuhao Li, Zixian Yue

    Abstract: Embodied AI research has traditionally emphasized performance metrics such as success rate and cumulative reward, overlooking critical robustness and safety considerations that emerge during real-world deployment. In actual environments, agents continuously encounter unpredicted situations and distribution shifts, causing seemingly reliable policies to experience catastrophic failures, particularl… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 6 pages,4 figures

  20. arXiv:2505.05467  [pdf, other

    cs.CV cs.AI cs.CL

    StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Authors: Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang

    Abstract: We present StreamBridge, a simple yet effective framework that seamlessly transforms offline Video-LLMs into streaming-capable models. It addresses two fundamental challenges in adapting existing models into online scenarios: (1) limited capability for multi-turn real-time understanding, and (2) lack of proactive response mechanisms. Specifically, StreamBridge incorporates (1) a memory buffer comb… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  21. arXiv:2505.05034  [pdf, other

    cs.LG stat.ML

    Dequantified Diffusion Schrödinger Bridge for Density Ratio Estimation

    Authors: Wei Chen, Shigui Li, Jiacheng Li, Junmei Yang, John Paisley, Delu Zeng

    Abstract: Density ratio estimation is fundamental to tasks involving $f$-divergences, yet existing methods often fail under significantly different distributions or inadequately overlap supports, suffering from the \textit{density-chasm} and the \textit{support-chasm} problems. Additionally, prior approaches yield divergent time scores near boundaries, leading to instability. We propose… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Journal ref: ICML 2025: Proceedings of the 42nd International Conference on Machine Learning, 2025

  22. arXiv:2505.04653  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Conversational Diagnostic AI with Multimodal Reasoning

    Authors: Khaled Saab, Jan Freyberg, Chunjong Park, Tim Strother, Yong Cheng, Wei-Hung Weng, David G. T. Barrett, David Stutz, Nenad Tomasev, Anil Palepu, Valentin Liévin, Yash Sharma, Roma Ruparel, Abdullah Ahmed, Elahe Vedadi, Kimberly Kanada, Cian Hughes, Yun Liu, Geoff Brown, Yang Gao, Sean Li, S. Sara Mahdavi, James Manyika, Katherine Chou, Yossi Matias , et al. (11 additional authors not shown)

    Abstract: Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in medical consultation, but the abil… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  23. arXiv:2505.04109  [pdf, other

    cs.CV

    One2Any: One-Reference 6D Pose Estimation for Any Object

    Authors: Mengya Liu, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari

    Abstract: 6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories. These requirements make generalization to novel objects difficult for which neither 3D models nor multi-view images may be available. To address this, we propose a novel method One2Any that estimates the relative 6-degr… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR 2025

    Journal ref: CVPR 2025

  24. arXiv:2505.03853  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.GN

    GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype

    Authors: Changxi Chi, Jun Xia, Jingbo Zhou, Jiabei Cheng, Chang Yu, Stan Z. Li

    Abstract: Predicting genetic perturbations enables the identification of potentially crucial genes prior to wet-lab experiments, significantly improving overall experimental efficiency. Since genes are the foundation of cellular life, building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations. However, current methods fail to fully leverage gene-relat… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  25. Matching Distance and Geometric Distribution Aided Learning Multiview Point Cloud Registration

    Authors: Shiqi Li, Jihua Zhu, Yifan Xie, Naiwen Hu, Di Wang

    Abstract: Multiview point cloud registration plays a crucial role in robotics, automation, and computer vision fields. This paper concentrates on pose graph construction and motion synchronization within multiview registration. Previous methods for pose graph construction often pruned fully connected graphs or constructed sparse graph using global feature aggregated from local descriptors, which may not con… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  26. arXiv:2505.03539  [pdf, other

    cs.CV cs.RO eess.IV

    Panoramic Out-of-Distribution Segmentation

    Authors: Mengfei Duan, Kailun Yang, Yuheng Zhang, Yihong Cao, Fei Teng, Kai Luo, Jiaming Zhang, Zhiyong Li, Shutao Li

    Abstract: Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introdu… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Code and datasets will be available at https://github.com/MengfeiD/PanOoS

  27. arXiv:2505.03507  [pdf, ps, other

    cs.CV

    Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking

    Authors: Shenglan Li, Rui Yao, Yong Zhou, Hancheng Zhu, Kunyang Sun, Bing Liu, Zhiwen Shao, Jiaqi Zhao

    Abstract: To reduce the reliance on large-scale annotations, self-supervised RGB-T tracking approaches have garnered significant attention. However, the omission of the object region by erroneous pseudo-label or the introduction of background noise affects the efficiency of modality fusion, while pseudo-label noise triggered by similar object noise can further affect the tracking performance. In this paper,… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted by the 34th International Joint Conference on Artificial Intelligence (IJCAI 2025)

  28. arXiv:2505.03214  [pdf, other

    cs.SE cs.AI

    DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

    Authors: Qiang Sun, Sirui Li, Tingting Bi, Du Huynh, Mark Reynolds, Yuanyi Luo, Wei Liu

    Abstract: Acquiring structured data from domain-specific, image-based documents such as scanned reports is crucial for many downstream tasks but remains challenging due to document variability. Many of these documents exist as images rather than as machine-readable text, which requires human annotation to train automated extraction systems. We present DocSpiral, the first Human-in-the-Spiral assistive docum… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  29. arXiv:2505.03097  [pdf, other

    cs.CV

    Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

    Authors: Lei Wang, Senmao Li, Fei Yang, Jianye Wang, Ziheng Zhang, Yuhan Liu, Yaxing Wang, Jian Yang

    Abstract: The diffusion models, in early stages focus on constructing basic image structures, while the refined details, including local features and textures, are generated in later stages. Thus the same network layers are forced to learn both structural and textural information simultaneously, significantly differing from the traditional deep learning architectures (e.g., ResNet or GANs) which captures or… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Accepted to CVPR 2025

  30. arXiv:2505.03054  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    BLAB: Brutally Long Audio Bench

    Authors: Orevaoghene Ahia, Martijn Bartelds, Kabir Ahuja, Hila Gonen, Valentin Hofmann, Siddhant Arora, Shuyue Stella Li, Vishal Puttagunta, Mofetoluwa Adeyemi, Charishma Buchireddy, Ben Walls, Noah Bennett, Shinji Watanabe, Noah A. Smith, Yulia Tsvetkov, Sachin Kumar

    Abstract: Developing large audio language models (LMs) capable of understanding diverse spoken interactions is essential for accommodating the multimodal nature of human communication and can increase the accessibility of language technologies across different user populations. Recent work on audio LMs has primarily evaluated their performance on short audio segments, typically under 30 seconds, with limite… ▽ More

    Submitted 12 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  31. arXiv:2505.02977  [pdf, ps, other

    cs.DC cs.DS math.NA

    Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners

    Authors: Tianyu Liang, Chao Chen, Yotam Yaniv, Hengrui Luo, David Tench, Xiaoye S. Li, Aydin Buluc, James Demmel

    Abstract: We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as discretization of a partial differential equation, spectral graph partitioning, and learning problems on graphs. The preconditioner belongs to the family of incom… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  32. arXiv:2505.02744  [pdf, other

    cs.RO

    Re-purposing a modular origami manipulator into an adaptive physical computer for machine learning and robotic perception

    Authors: Jun Wang, Suyi Li

    Abstract: Physical computing has emerged as a powerful tool for performing intelligent tasks directly in the mechanical domain of functional materials and robots, reducing our reliance on the more traditional COMS computers. However, no systematic study explains how mechanical design can influence physical computing performance. This study sheds insights into this question by repurposing an origami-inspired… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  33. arXiv:2505.02418  [pdf, ps, other

    cs.IR cs.HC

    SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration

    Authors: Qiang Sun, Tingting Bi, Sirui Li, Eun-Jung Holden, Paul Duuring, Kai Niu, Wei Liu

    Abstract: We present \textbf{SymbioticRAG}, a novel framework that fundamentally reimagines Retrieval-Augmented Generation~(RAG) systems by establishing a bidirectional learning relationship between humans and machines. Our approach addresses two critical challenges in current RAG systems: the inherently human-centered nature of relevance determination and users' progression from "unconscious incompetence"… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  34. arXiv:2505.02228  [pdf, other

    cs.LG cs.AI

    Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning

    Authors: Shangzhe Li, Zhiao Huang, Hao Su

    Abstract: Imitation Learning (IL) has achieved remarkable success across various domains, including robotics, autonomous driving, and healthcare, by enabling agents to learn complex behaviors from expert demonstrations. However, existing IL methods often face instability challenges, particularly when relying on adversarial reward or value formulations in world model frameworks. In this work, we propose a no… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  35. arXiv:2505.02179  [pdf, other

    cs.CV

    ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

    Authors: Tao Zhu, Qi Yu, Xinru Dong, Shiyu Li, Yue Liu, Jinlong Jiang, Lei Shu

    Abstract: Weakly-supervised video anomaly detection (WS-VAD) using Multiple Instance Learning (MIL) suffers from label ambiguity, hindering discriminative feature learning. We propose ProDisc-VAD, an efficient framework tackling this via two synergistic components. The Prototype Interaction Layer (PIL) provides controlled normality modeling using a small set of learnable prototypes, establishing a robust ba… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  36. arXiv:2505.01749  [pdf, other

    cs.CR

    Unified Steganography via Implicit Neural Representation

    Authors: Qi Song, Ziyuan Luo, Xiufeng Huang, Sheng Li, Renjie Wan

    Abstract: Digital steganography is the practice of concealing for encrypted data transmission. Typically, steganography methods embed secret data into cover data to create stega data that incorporates hidden secret data. However, steganography techniques often require designing specific frameworks for each data type, which restricts their generalizability. In this paper, we present U-INR, a novel method for… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  37. arXiv:2505.01660  [pdf, other

    cs.LG

    Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification

    Authors: Sicong Li, Qianqian Xu, Zhiyong Yang, Zitai Wang, Linchao Zhang, Xiaochun Cao, Qingming Huang

    Abstract: Real-world datasets often follow a long-tailed distribution, making generalization to tail classes difficult. Recent methods resorted to long-tail variants of Sharpness-Aware Minimization (SAM), such as ImbSAM and CC-SAM, to improve generalization by flattening the loss landscape. However, these attempts face a trade-off between computational efficiency and control over the loss landscape. On the… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  38. arXiv:2505.01343  [pdf, other

    cs.AI

    BalancEdit: Dynamically Balancing the Generality-Locality Trade-off in Multi-modal Model Editing

    Authors: Dongliang Guo, Mengxuan Hu, Zihan Guan, Thomas Hartvigsen, Sheng Li

    Abstract: Large multi-modal models inevitably decay over time as facts change and previously learned information becomes outdated. Traditional approaches such as fine-tuning are often impractical for updating these models due to their size and complexity. Instead, direct knowledge editing within the models presents a more viable solution. Current model editing techniques, however, typically overlook the uni… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Journal ref: Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025

  39. arXiv:2505.00322  [pdf, other

    cs.RO cs.AI

    AI2-Active Safety: AI-enabled Interaction-aware Active Safety Analysis with Vehicle Dynamics

    Authors: Keshu Wu, Zihao Li, Sixu Li, Xinyue Ye, Dominique Lord, Yang Zhou

    Abstract: This paper introduces an AI-enabled, interaction-aware active safety analysis framework that accounts for groupwise vehicle interactions. Specifically, the framework employs a bicycle model-augmented with road gradient considerations-to accurately capture vehicle dynamics. In parallel, a hypergraph-based AI model is developed to predict probabilistic trajectories of ambient traffic. By integrating… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  40. arXiv:2505.00063  [pdf, other

    cs.CL cs.CV

    GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling

    Authors: Siqi Li, Yufan Shen, Xiangnan Chen, Jiayi Chen, Hengwei Ju, Haodong Duan, Song Mao, Hongbin Zhou, Bo Zhang, Pinlong Cai, Licheng Wen, Botian Shi, Yong Liu, Xinyu Cai, Yu Qiao

    Abstract: The rapid advancement of multimodal large language models (MLLMs) has profoundly impacted the document domain, creating a wide array of application scenarios. This progress highlights the need for a comprehensive benchmark to evaluate these models' capabilities across various document-specific tasks. However, existing benchmarks often fail to locate specific model weaknesses or guide systematic im… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  41. arXiv:2504.21718  [pdf, other

    cs.CV

    VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction

    Authors: Shiying Li, Xingqun Qi, Bingkun Yang, Chen Weile, Zezhao Tian, Muyi Sun, Qifeng Liu, Man Zhang, Zhenan Sun

    Abstract: Generating responsive listener head dynamics with nuanced emotions and expressive reactions is crucial for practical dialogue modeling in various virtual avatar animations. Previous studies mainly focus on the direct short-term production of listener behavior. They overlook the fine-grained control over motion variations and emotional intensity, especially in long-sequence modeling. Moreover, the… ▽ More

    Submitted 6 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  42. arXiv:2504.21278  [pdf, other

    cs.MA

    Robust Multi-agent Communication Based on Decentralization-Oriented Adversarial Training

    Authors: Xuyan Ma, Yawen Wang, Junjie Wang, Xiaofei Xie, Boyu Wu, Shoubin Li, Fanjiang Xu, Qing Wang

    Abstract: In typical multi-agent reinforcement learning (MARL) problems, communication is important for agents to share information and make the right decisions. However, due to the complexity of training multi-agent communication, existing methods often fall into the dilemma of local optimization, which leads to the concentration of communication in a limited number of channels and presents an unbalanced s… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  43. arXiv:2504.21226  [pdf, other

    cs.CV cs.AI

    MemeBLIP2: A novel lightweight multimodal system to detect harmful memes

    Authors: Jiaqi Liu, Ran Tong, Aowei Shen, Shuzheng Li, Changlin Yang, Lisha Xu

    Abstract: Memes often merge visuals with brief text to share humor or opinions, yet some memes contain harmful messages such as hate speech. In this paper, we introduces MemeBLIP2, a light weight multimodal system that detects harmful memes by combining image and text features effectively. We build on previous studies by adding modules that align image and text representations into a shared space and fuse t… ▽ More

    Submitted 6 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: 11pages, 3 figures, manucripts in preparation

  44. arXiv:2504.21055  [pdf, ps, other

    cs.LG cs.AI

    Modeling and Performance Analysis for Semantic Communications Based on Empirical Results

    Authors: Shuai Ma, Bin Shen, Chuanhui Zhang, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

    Abstract: Due to the black-box characteristics of deep learning based semantic encoders and decoders, finding a tractable method for the performance analysis of semantic communications is a challenging problem. In this paper, we propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR, which can be applied for both image reconstruction tasks and inferenc… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  45. arXiv:2504.21035  [pdf, other

    cs.CR cs.CL cs.LG

    A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

    Authors: Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh

    Abstract: Sanitizing sensitive text data typically involves removing personally identifiable information (PII) or generating synthetic data under the assumption that these methods adequately protect privacy; however, their effectiveness is often only assessed by measuring the leakage of explicit identifiers but ignoring nuanced textual markers that can lead to re-identification. We challenge the above illus… ▽ More

    Submitted 2 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  46. arXiv:2504.20969  [pdf, other

    cs.RO cs.LG

    XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search

    Authors: Yiting Zhang, Shichen Li, Elena Shrestha

    Abstract: Mechanical search (MS) in cluttered environments remains a significant challenge for autonomous manipulators, requiring long-horizon planning and robust state estimation under occlusions and partial observability. In this work, we introduce XPG-RL, a reinforcement learning framework that enables agents to efficiently perform MS tasks through explainable, priority-guided decision-making based on ra… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 13 pages, 5 figures

  47. arXiv:2504.20964  [pdf, other

    cs.CL cs.AI cs.OS cs.PL cs.SE

    OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification

    Authors: Shangyu Li, Juyong Jiang, Tiancheng Zhao, Jiasi Shen

    Abstract: We introduce OSVBench, a new benchmark for evaluating Large Language Models (LLMs) in generating complete specification code pertaining to operating system kernel verification tasks. The benchmark first defines the specification generation problem into a program synthesis problem within a confined scope of syntax and semantics by providing LLMs with the programming model. The LLMs are required to… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  48. arXiv:2504.20570  [pdf, other

    cs.CR

    ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, Shouling Ji

    Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a practical solution for adapting large language models (LLMs) to custom datasets with significantly reduced computational cost. When carrying out PEFT under collaborative learning scenarios (e.g., federated learning), it is often required to exchange model updates (or gradients) across parties. These gradients, even with limited dimensions, ca… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  49. arXiv:2504.19529  [pdf, other

    cs.CV cs.MM

    Adversarial Shallow Watermarking

    Authors: Guobiao Li, Lei Tan, Yuliang Xue, Gaozhi Liu, Zhenxing Qian, Sheng Li, Xinpeng Zhang

    Abstract: Recent advances in digital watermarking make use of deep neural networks for message embedding and extraction. They typically follow the ``encoder-noise layer-decoder''-based architecture. By deliberately establishing a differentiable noise layer to simulate the distortion of the watermarked signal, they jointly train the deep encoder and decoder to fit the noise layer to guarantee robustness. As… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 10 pages, 12 figures

  50. arXiv:2504.19449  [pdf, other

    cs.LG

    R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

    Authors: Zhenyu Zhang, Zechun Liu, Yuandong Tian, Harshit Khaitan, Zhangyang Wang, Steven Li

    Abstract: Large Language Models (LLMs), while demonstrating remarkable capabilities across various applications, present significant challenges during inference due to their substantial model size, especially when deployed on edge devices. Activation sparsity offers a promising solution to reduce computation and memory movement, enabling more efficient inference, particularly for small-batch on-device appli… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: ICLR 2025