Skip to main content

Showing 1–50 of 1,126 results for author: Jiang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02332  [pdf, ps, other

    cs.CR

    PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage

    Authors: Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou

    Abstract: This paper investigates privacy jailbreaking in LLMs via steering, focusing on whether manipulating activations can bypass LLM alignment and alter response behaviors to privacy related queries (e.g., a certain public figure's sexual orientation). We begin by identifying attention heads predictive of refusal behavior for private attributes (e.g., sexual orientation) using lightweight linear probes… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: Preprint

  2. arXiv:2507.02288  [pdf, ps, other

    cs.CV cs.LG

    Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization

    Authors: De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nannan Wang, Xinbo Gao

    Abstract: Domain Generalization (DG) seeks to develop a versatile model capable of performing effectively on unseen target domains. Notably, recent advances in pre-trained Visual Foundation Models (VFMs), such as CLIP, have demonstrated considerable potential in enhancing the generalization capabilities of deep learning models. Despite the increasing attention toward VFM-based domain prompt tuning within DG… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  3. arXiv:2507.01630  [pdf, ps, other

    cs.CV cs.AI

    Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss

    Authors: Yuxiao Wang, Yu Lei, Zhenao Wei, Weiying Xue, Xinyu Jiang, Nan Zhuang, Qi Liu

    Abstract: The task of Human-Object conTact (HOT) detection involves identifying the specific areas of the human body that are touching objects. Nevertheless, current models are restricted to just one type of image, often leading to too much segmentation in areas with little interaction, and struggling to maintain category consistency within specific regions. To tackle this issue, a HOT framework, termed \te… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  4. arXiv:2507.00748  [pdf, ps, other

    cs.CV

    Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning

    Authors: Bob Zhang, Haoran Li, Tao Zhang, Cilin Yan, Jiayin Cai, Xiaolong Jiang, Yanbin Hao

    Abstract: Recently, Multimodal Large Language Models (MLLMs) excel at visual grounding in single-image scenarios with textual references. However, their performance degrades when handling real-world applications involving complex multi-image compositions and multimodal instructions, which reveals limitations in cross-image reasoning and generalization. To address these challenges, we adopt a Reinforcement L… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 11 pages

  5. arXiv:2506.21960  [pdf, ps, other

    cs.PF

    Redundant Array Computation Elimination

    Authors: Zixuan Wang, Liang Yuan, Xianmeng Jiang, Kun Li, Junmin Xiao, Yunquan Zhang

    Abstract: Redundancy elimination is a key optimization direction, and loop nests are the main optimization target in modern compilers. Previous work on redundancy elimination of array computations in loop nests lacks universality. These approaches either focus on specific computation patterns or fail to recognize redundancies with complex structures. This paper proposes RACE (Redundant Array Computation Eli… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  6. arXiv:2506.21049  [pdf, ps, other

    cs.CL cs.AI cs.IR

    A Semi-supervised Scalable Unified Framework for E-commerce Query Classification

    Authors: Chunyuan Yuan, Chong Zhang, Zheng Fang, Ming Pang, Xue Jiang, Changping Peng, Zhangang Lin, Ching Law

    Abstract: Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct tr… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025

  7. arXiv:2506.19884  [pdf, ps, other

    cs.OS cs.AI cs.PF cs.SE

    MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

    Authors: Zhengxiang Huang, Chaoyue Niu, Zhaode Wang, Jiarui Xue, Hanming Zhang, Yugang Wang, Zewei Xin, Xiaotang Jiang, Chengfei Lv, Fan Wu, Guihai Chen

    Abstract: As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS)… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  8. arXiv:2506.18871  [pdf, ps, other

    cs.CV cs.AI cs.CL

    OmniGen2: Exploration to Advanced Multimodal Generation

    Authors: Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, Ze Liu, Ziyi Xia, Chaofan Li, Haoge Deng, Jiahao Wang, Kun Luo, Bo Zhang, Defu Lian, Xinlong Wang, Zhongyuan Wang, Tiejun Huang, Zheng Liu

    Abstract: In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables… ▽ More

    Submitted 25 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  9. arXiv:2506.17125  [pdf, ps, other

    cs.SE

    Large Language Model Unlearning for Source Code

    Authors: Xue Jiang, Yihong Dong, Zheng Fang, Yingwei Ma, Tangxinyu Wang, Rongyu Cao, Binhua Li, Zhi Jin, Wenpin Jiao, Yongbin Li, Ge Li

    Abstract: LLM4SE has demonstrated significant success, but LLMs' potential memorization of sensitive or outdated training data introduces critical risks to legal compliance, software security, and code quality. LLM unlearning techniques, which can eliminate the influence of undesired data from LLMs in a post-training way, present a promising solution to address these concerns. While recent efforts in LLM un… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  10. arXiv:2506.16273  [pdf, ps, other

    cs.CV cs.MM

    Fine-grained Image Retrieval via Dual-Vision Adaptation

    Authors: Xin Jiang, Meiqi Cao, Hao Tang, Fei Shen, Zechao Li

    Abstract: Fine-Grained Image Retrieval~(FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features. Current leading FGIR solutions typically follow two regimes: enforce pairwise similarity constraints in the semantic embedding space, or incorporate a localization sub-network to fine-tune the entire model. However, such two regimes tend to o… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  11. arXiv:2506.16263  [pdf, ps, other

    cs.RO cs.AI

    CapsDT: Diffusion-Transformer for Capsule Robot Manipulation

    Authors: Xiting He, Mingwu Su, Xinqi Jiang, Long Bai, Jiewen Lai, Hongliang Ren

    Abstract: Vision-Language-Action (VLA) models have emerged as a prominent research area, showcasing significant potential across a variety of applications. However, their performance in endoscopy robotics, particularly endoscopy capsule robots that perform actions within the digestive system, remains unexplored. The integration of VLA models into endoscopy robots allows more intuitive and efficient interact… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IROS 2025

  12. arXiv:2506.14728  [pdf, ps, other

    cs.AI

    AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

    Authors: Jiahao Qiu, Xinzhe Juan, Yimin Wang, Ling Yang, Xuan Qi, Tongcheng Zhang, Jiacheng Guo, Yifu Lu, Zixin Yao, Hongru Wang, Shilong Liu, Xun Jiang, Liu Leqi, Mengdi Wang

    Abstract: While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teache… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 10 pages, 5 figures

  13. arXiv:2506.14243  [pdf, ps, other

    cs.CV

    Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition

    Authors: Xiaohui Jiang, Haijiang Zhu, Chade Li, Fulin Tang, Ning An

    Abstract: LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representati… ▽ More

    Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

  14. arXiv:2506.14087  [pdf, ps, other

    cs.LG

    Multi-Scale Finetuning for Encoder-based Time Series Foundation Models

    Authors: Zhongzheng Qiao, Chenghao Liu, Yiming Zhang, Ming Jin, Quang Pham, Qingsong Wen, P. N. Suganthan, Xudong Jiang, Savitha Ramasamy

    Abstract: Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal per… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  15. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  16. arXiv:2506.10580  [pdf, ps, other

    cs.GR cs.CV

    Transformer IMU Calibrator: Dynamic On-body IMU Calibration for Inertial Motion Capture

    Authors: Chengxu Zuo, Jiawei Huang, Xiao Jiang, Yuan Yao, Xiangren Shi, Rui Cao, Xinyu Yi, Feng Xu, Shihui Guo, Yipeng Qin

    Abstract: In this paper, we propose a novel dynamic calibration method for sparse inertial motion capture systems, which is the first to break the restrictive absolute static assumption in IMU calibration, i.e., the coordinate drift RG'G and measurement offset RBS remain constant during the entire motion, thereby significantly expanding their application scenarios. Specifically, we achieve real-time estimat… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGGRAPH 2025 (TOG)

  17. MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices

    Authors: Zhaode Wang, Jingbang Yang, Xinyu Qian, Shiwen Xing, Xiaotang Jiang, Chengfei Lv, Shengyu Zhang

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduce… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 7 pages, 5 figures. Published in the Proceedings of the 6th ACM International Conference on Multimedia in Asia Workshops (MMAsia '24 Workshops). The final authenticated version is available at https://dl.acm.org/doi/10.1145/3700410.3702126

  18. arXiv:2506.08795  [pdf, other

    cs.RO cs.AI

    Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning

    Authors: Kaijie Shi, Wanglong Lu, Hanli Zhao, Vinicius Prado da Fonseca, Ting Zou, Xianta Jiang

    Abstract: Limb loss affects millions globally, impairing physical function and reducing quality of life. Most traditional surface electromyographic (sEMG) and semi-autonomous methods require users to generate myoelectric signals for each control, imposing physically and mentally taxing demands. This study aims to develop a fully autonomous control system that enables a prosthetic hand to automatically grasp… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  19. arXiv:2506.06881  [pdf, other

    cs.AI

    KnowCoder-V2: Deep Knowledge Analysis

    Authors: Zixuan Li, Wenxuan Liu, Long Bai, Chunmao Zhang, Wei Li, Fenghui Zhang, Quanxin Jin, Ruoyun He, Zhuo Chen, Zhilei Hu, Fei Wang, Bingbing Xu, Xuhui Jiang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: Deep knowledge analysis tasks always involve the systematic extraction and association of knowledge from large volumes of data, followed by logical reasoning to discover insights. However, to solve such complex tasks, existing deep research frameworks face three major challenges: 1) They lack systematic organization and management of knowledge; 2) They operate purely online, making it inefficient… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  20. arXiv:2506.06864  [pdf, ps, other

    cs.CV cs.AI

    Face recognition on point cloud with cgan-top for denoising

    Authors: Junyu Liu, Jianfeng Ren, Sunhong Liang, Xudong Jiang

    Abstract: Face recognition using 3D point clouds is gaining growing interest, while raw point clouds often contain a significant amount of noise due to imperfect sensors. In this paper, an end-to-end 3D face recognition on a noisy point cloud is proposed, which synergistically integrates the denoising and recognition modules. Specifically, a Conditional Generative Adversarial Network on Three Orthogonal Pla… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Published in ICASSP 2023

  21. arXiv:2506.06205  [pdf, other

    cs.RO cs.AI

    Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

    Authors: Sheng Chen, Peiyu He, Jiaxin Hu, Ziyang Liu, Yansheng Wang, Tao Xu, Chi Zhang, Chongchong Zhang, Chao An, Shiyu Cai, Duo Cao, Kangping Chen, Shuai Chu, Tianwei Chu, Mingdi Dan, Min Du, Weiwei Fang, Pengyou Fu, Junkai Hu, Xiaowei Jiang, Zhaodi Jiang, Fuxuan Li, Jun Li, Minghui Li, Mingyao Li , et al. (46 additional authors not shown)

    Abstract: Modern robot navigation systems encounter difficulties in diverse and complex indoor environments. Traditional approaches rely on multiple modules with small models or rule-based systems and thus lack adaptability to new environments. To address this, we developed Astra, a comprehensive dual-model architecture, Astra-Global and Astra-Local, for mobile robot navigation. Astra-Global, a multimodal L… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Astra Technical Report

  22. arXiv:2506.06072  [pdf, ps, other

    cs.RO cs.LG

    BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

    Authors: Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

    Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action seque… ▽ More

    Submitted 10 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  23. arXiv:2506.05791  [pdf, ps, other

    cs.LG math.OC

    Exploiting Similarity for Computation and Communication-Efficient Decentralized Optimization

    Authors: Yuki Takezawa, Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

    Abstract: Reducing communication complexity is critical for efficient decentralized optimization. The proximal decentralized optimization (PDO) framework is particularly appealing, as methods within this framework can exploit functional similarity among nodes to reduce communication rounds. Specifically, when local functions at different nodes are similar, these methods achieve faster convergence with fewer… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  24. arXiv:2506.04619  [pdf, ps, other

    cs.CV

    Deep Learning Reforms Image Matching: A Survey and Outlook

    Authors: Shihua Zhang, Zizhuo Li, Kaining Zhang, Yifan Lu, Yuxin Deng, Linfeng Tang, Xingyu Jiang, Jiayi Ma

    Abstract: Image matching, which establishes correspondences between two-view images to recover 3D structure and camera geometry, serves as a cornerstone in computer vision and underpins a wide range of applications, including visual localization, 3D reconstruction, and simultaneous localization and mapping (SLAM). Traditional pipelines composed of ``detector-descriptor, feature matcher, outlier filter, and… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  25. arXiv:2506.04235  [pdf, ps, other

    q-bio.QM cs.AI cs.CE cs.LG q-bio.BM

    Benchmark for Antibody Binding Affinity Maturation and Design

    Authors: Xinyan Zhao, Yi-Ching Tang, Akshita Singh, Victor J Cantu, KwanHo An, Junseok Lee, Adam E Stogsdill, Ashwin Kumar Ramesh, Zhiqiang An, Xiaoqian Jiang, Yejin Kim

    Abstract: We introduce AbBiBench (Antibody Binding Benchmarking), a benchmarking framework for antibody binding affinity maturation and design. Unlike existing antibody evaluation strategies that rely on antibody alone and its similarity to natural ones (e.g., amino acid identity rate, structural RMSD), AbBiBench considers an antibody-antigen (Ab-Ag) complex as a functional unit and evaluates the potential… ▽ More

    Submitted 23 May, 2025; originally announced June 2025.

  26. arXiv:2506.03827  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising

    Authors: Zhenhui Liu, Chunyuan Yuan, Ming Pang, Zheng Fang, Li Yuan, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo, Jingping Shao

    Abstract: Retrieval systems primarily address the challenge of matching user queries with the most relevant advertisements, playing a crucial role in e-commerce search advertising. The diversity of user needs and expressions often produces massive long-tail queries that cannot be matched with merchant bidwords or product titles, which results in some advertisements not being recalled, ultimately harming use… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGIR2025

  27. arXiv:2506.02079  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Robust Federated Learning against Noisy Clients via Masked Optimization

    Authors: Xuefeng Jiang, Tian Wen, Zhiqin Yang, Lvhua Wu, Yufeng Chen, Sheng Sun, Yuwei Wang, Min Liu

    Abstract: In recent years, federated learning (FL) has made significant advance in privacy-sensitive applications. However, it can be hard to ensure that FL participants provide well-annotated data for training. The corresponding annotations from different clients often contain complex label noise at varying levels. This label noise issue has a substantial impact on the performance of the trained models, an… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Under review

  28. arXiv:2506.01934  [pdf, ps, other

    cs.AI

    RoboEgo System Card: An Omnimodal Model with Native Full Duplexity

    Authors: Yiqun Yao, Xiang Li, Xin Jiang, Xuezhi Fang, Naitong Yu, Aixin Sun, Yequan Wang

    Abstract: Humans naturally process real-world multimodal information in a full-duplex manner. In artificial intelligence, replicating this capability is essential for advancing model development and deployment, particularly in embodied contexts. The development of multimodal models faces two primary challenges: (1) effectively handling more than three modalities-such as vision, audio, and text; and (2) deli… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  29. arXiv:2506.01776  [pdf, ps, other

    cs.CL cs.AI

    MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation

    Authors: Yile Liu, Ziwei Ma, Xiu Jiang, Jinglu Hu, Jing Chang, Liang Li

    Abstract: With the rapid adoption of large language models (LLMs) in natural language processing, the ability to follow instructions has emerged as a key metric for evaluating their practical utility. However, existing evaluation methods often focus on single-language scenarios, overlooking the challenges and differences present in multilingual and cross-lingual contexts. To address this gap, we introduce M… ▽ More

    Submitted 2 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Main Conference

  30. arXiv:2506.00989  [pdf, other

    cs.AI

    Boosting Bot Detection via Heterophily-Aware Representation Learning and Prototype-Guided Cluster Discovery

    Authors: Buyun He, Xiaorui Jiang, Qi Wu, Hao Liu, Yingguang Yang, Yong Liao

    Abstract: Detecting social media bots is essential for maintaining the security and trustworthiness of social networks. While contemporary graph-based detection methods demonstrate promising results, their practical application is limited by label reliance and poor generalization capability across diverse communities. Generative Graph Self-Supervised Learning (GSL) presents a promising paradigm to overcome… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: KDD 2025

  31. arXiv:2506.00103  [pdf, ps, other

    cs.CL

    Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards

    Authors: Ruipeng Jia, Yunyi Yang, Yongbo Gai, Kai Luo, Shihao Huang, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang

    Abstract: Reinforcement learning with verifiable rewards (RLVR) has enabled large language models (LLMs) to achieve remarkable breakthroughs in reasoning tasks with objective ground-truth answers, such as mathematics and code generation. However, a significant gap remains for non-verifiable tasks, like creative writing and open-ended dialogue, where quality assessment is inherently subjective and lacks defi… ▽ More

    Submitted 11 June, 2025; v1 submitted 30 May, 2025; originally announced June 2025.

  32. arXiv:2505.24261  [pdf, other

    cs.LG stat.ML

    Taming Hyperparameter Sensitivity in Data Attribution: Practical Selection Without Costly Retraining

    Authors: Weiyi Wang, Junwei Deng, Yuzheng Hu, Shiyuan Zhang, Xirui Jiang, Runting Zhang, Han Zhao, Jiaqi W. Ma

    Abstract: Data attribution methods, which quantify the influence of individual training data points on a machine learning model, have gained increasing popularity in data-centric applications in modern AI. Despite a recent surge of new methods developed in this space, the impact of hyperparameter tuning in these methods remains under-explored. In this work, we present the first large-scale empirical study t… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  33. arXiv:2505.24108  [pdf, ps, other

    cs.CV cs.LG

    Federated Foundation Model for GI Endoscopy Images

    Authors: Alina Devkota, Annahita Amireskandari, Joel Palko, Shyam Thakkar, Donald Adjeroh, Xiajun Jiang, Binod Bhattarai, Prashnna K. Gyawali

    Abstract: Gastrointestinal (GI) endoscopy is essential in identifying GI tract abnormalities in order to detect diseases in their early stages and improve patient outcomes. Although deep learning has shown success in supporting GI diagnostics and decision-making, these models require curated datasets with labels that are expensive to acquire. Foundation models offer a promising solution by learning general-… ▽ More

    Submitted 5 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: 11 pages, 11 figures, submitted to BHI2025

    ACM Class: I.2.10; I.4; I.5

  34. arXiv:2505.23379  [pdf, ps, other

    eess.AS cs.SD

    Vision-Integrated High-Quality Neural Speech Coding

    Authors: Yao Guo, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Xiao-Hang Jiang, Zhen-Hua Ling

    Abstract: This paper proposes a novel vision-integrated neural speech codec (VNSC), which aims to enhance speech coding quality by leveraging visual modality information. In VNSC, the image analysis-synthesis module extracts visual features from lip images, while the feature fusion module facilitates interaction between the image analysis-synthesis module and the speech coding module, transmitting visual in… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted by interspeech2025

  35. arXiv:2505.23184  [pdf, ps, other

    cs.LG cs.SE

    Two Is Better Than One: Rotations Scale LoRAs

    Authors: Hongcan Guo, Guoshun Nan, Yuan Yang, Diyang Zhang, Haotian Li, Zhican Chen, Qinchuan Zhou, Yuhan Ran, Xinye Cao, Sicong Leng, Xiaofeng Tao, Xudong Jiang

    Abstract: Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) facilitates large language models (LLMs) to efficiently adapt to diverse tasks. However, traditional gating mechanisms that route inputs to the best experts may fundamentally hinder LLMs' scalability, leading to poor generalization and underfitting issues. We identify that the root cause lies in the restricted expressiveness of exis… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 27pages, 16figures

    MSC Class: 68T50 ACM Class: I.2.6

  36. arXiv:2505.22323  [pdf, other

    cs.CL cs.SE

    Advancing Expert Specialization for Better MoE

    Authors: Hongcan Guo, Haolang Lu, Guoshun Nan, Bolun Chu, Jialin Zhuang, Yuan Yang, Wenhao Che, Sicong Leng, Qimei Cui, Xudong Jiang

    Abstract: Mixture-of-Experts (MoE) models enable efficient scaling of large language models (LLMs) by activating only a subset of experts per input. However, we observe that the commonly used auxiliary load balancing loss often leads to expert overlap and overly uniform routing, which hinders expert specialization and degrades overall performance during post-training. To address this, we propose a simple ye… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 33pages, 6figures

    MSC Class: 68T07 ACM Class: I.2.7

  37. arXiv:2505.22266  [pdf, ps, other

    cs.SD cs.MM eess.AS

    FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

    Authors: Jialin Yan, Yu Cheng, Zhaoxia Yin, Xinpeng Zhang, Shilin Wang, Tanfeng Sun, Xinghao Jiang

    Abstract: The rapid development of Artificial Intelligence Generated Content (AIGC) has made high-fidelity generated audio widely available across the Internet, providing diverse cover signals for covert communication. Driven by advances in deep learning, current audio steganography schemes are mainly based on encoding-decoding network architectures. While these methods greatly improve the security of audio… ▽ More

    Submitted 5 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  38. arXiv:2505.21025  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    Text-Queried Audio Source Separation via Hierarchical Modeling

    Authors: Xinlei Yin, Xiulian Peng, Xue Jiang, Zhiwei Xiong, Yan Lu

    Abstract: Target audio source separation with natural language queries presents a promising paradigm for extracting arbitrary audio events through arbitrary text descriptions. Existing methods mainly face two challenges, the difficulty in jointly modeling acoustic-textual alignment and semantic-aware separation within a blindly-learned single-stage architecture, and the reliance on large-scale accurately-la… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  39. arXiv:2505.20966  [pdf, ps, other

    cs.CL cs.IR

    Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation

    Authors: Zhibo Wang, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li

    Abstract: Query auto-completion (QAC) plays a crucial role in modern search systems. However, in real-world applications, there are two pressing challenges that still need to be addressed. First, there is a need for hierarchical personalized representations for users. Previous approaches have typically used users' search behavior as a single, overall representation, which proves inadequate in more nuanced g… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: KDD 2025

  40. arXiv:2505.20777  [pdf, ps, other

    cs.CV

    TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs

    Authors: Zhehan Kan, Yanlin Liu, Kun Yin, Xinghua Jiang, Xin Li, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun, Qingmin Liao, Wenming Yang

    Abstract: DeepSeek R1 has significantly advanced complex reasoning for large language models (LLMs). While recent methods have attempted to replicate R1's reasoning capabilities in multimodal settings, they face limitations, including inconsistencies between reasoning and final answers, model instability and crashes during long-chain exploration, and low data learning efficiency. To address these challenges… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  41. arXiv:2505.20600  [pdf, ps, other

    cs.DC cs.AI cs.LG

    InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling

    Authors: Xiaoxiao Jiang, Suyi Li, Lingyun Yang, Tianyu Feng, Zhipeng Di, Weiyi Lu, Guoxuan Zhu, Xiu Lin, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: Generative image editing using diffusion models has become a prevalent application in today's AI cloud services. In production environments, image editing typically involves a mask that specifies the regions of an image template to be edited. The use of masks provides direct control over the editing process and introduces sparsity in the model inference. In this paper, we present InstGenIE, a syst… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  42. arXiv:2505.20286  [pdf, ps, other

    cs.AI

    Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

    Authors: Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang

    Abstract: Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ulti… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 3 figures

  43. arXiv:2505.20246  [pdf, ps, other

    cs.AI cs.CL

    On Path to Multimodal Historical Reasoning: HistBench and HistAgent

    Authors: Jiahao Qiu, Fulian Xiao, Yimin Wang, Yuchen Mao, Yijia Chen, Xinzhe Juan, Shu Zhang, Siran Wang, Xuan Qi, Tongcheng Zhang, Zixin Yao, Jiacheng Guo, Yifu Lu, Charles Argon, Jundi Cui, Daixin Chen, Junran Zhou, Shuyao Zhou, Zhanpeng Zhou, Ling Yang, Shilong Liu, Hongru Wang, Kaixuan Huang, Xun Jiang, Yuming Cao , et al. (74 additional authors not shown)

    Abstract: Recent advances in large language models (LLMs) have led to remarkable progress across domains, yet their capabilities in the humanities, particularly history, remain underexplored. Historical reasoning poses unique challenges for AI, involving multimodal source interpretation, temporal inference, and cross-linguistic analysis. While general-purpose agents perform well on many existing benchmarks,… ▽ More

    Submitted 19 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 17 pages, 7 figures

  44. arXiv:2505.19990  [pdf, ps, other

    cs.CV

    Progressive Scaling Visual Object Tracking

    Authors: Jack Hong, Shilin Yan, Zehao Xiao, Jiayin Cai, Xiaolong Jiang, Yao Hu, Henghui Ding

    Abstract: In this work, we propose a progressive scaling training strategy for visual object tracking, systematically analyzing the influence of training data volume, model size, and input resolution on tracking performance. Our empirical study reveals that while scaling each factor leads to significant improvements in tracking accuracy, naive training suffers from suboptimal optimization and limited iterat… ▽ More

    Submitted 28 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  45. arXiv:2505.19547  [pdf, ps, other

    cs.LG cs.AI

    STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization

    Authors: Haoyu Zhang, Wentao Zhang, Hao Miao, Xinke Jiang, Yuchen Fang, Yifan Zhang

    Abstract: Spatio-Temporal Graph Neural Networks (STGNNs) have emerged as a powerful tool for modeling dynamic graph-structured data across diverse domains. However, they often fail to generalize in Spatio-Temporal Out-of-Distribution (STOOD) scenarios, where both temporal dynamics and spatial structures evolve beyond the training distribution. To address this problem, we propose an innovative Spatio-Tempora… ▽ More

    Submitted 27 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  46. arXiv:2505.18992  [pdf, ps, other

    cs.CV

    VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes

    Authors: Tianchen Deng, Wenhua Wu, Junjie He, Yue Pan, Xirui Jiang, Shenghai Yuan, Danwei Wang, Hesheng Wang, Weidong Chen

    Abstract: 3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  47. arXiv:2505.18979  [pdf, other

    cs.LG

    GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization

    Authors: Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun

    Abstract: Text-to-image (T2I) generation models can inadvertently produce not-safe-for-work (NSFW) content, prompting the integration of text and image safety filters. Recent advances employ large language models (LLMs) for semantic-level detection, rendering traditional token-level perturbation attacks largely ineffective. However, our evaluation shows that existing jailbreak methods are ineffective agains… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  48. arXiv:2505.18399  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Taming Diffusion for Dataset Distillation with High Representativeness

    Authors: Lin Zhao, Yushu Wu, Xinru Jiang, Jianyang Gu, Yanzhi Wang, Xiaolin Xu, Pu Zhao, Xue Lin

    Abstract: Recent deep learning models demand larger datasets, driving the need for dataset distillation to create compact, cost-efficient datasets while maintaining performance. Due to the powerful image generation capability of diffusion, it has been introduced to this field for generating distilled images. In this paper, we systematically investigate issues present in current diffusion-based dataset disti… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: The paper is accepted by ICML 2025

  49. arXiv:2505.17829  [pdf, other

    cs.CL

    Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Mathematical reasoning through Chain-of-Thought (CoT) has emerged as a powerful capability of Large Language Models (LLMs), which can be further enhanced through Test-Time Scaling (TTS) methods like Beam Search and DVTS. However, these methods, despite improving accuracy by allocating more computational resources during inference, often suffer from path homogenization and inefficient use of interm… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  50. arXiv:2505.17266  [pdf, other

    cs.CL cs.AI

    Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

    Authors: Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Xiaojun Wu, Honghao Liu, Hui Xiong, Jian Guo

    Abstract: A practical approach to activate long chain-of-thoughts reasoning ability in pre-trained large language models is to perform supervised fine-tuning on instruction datasets synthesized by strong Large Reasoning Models such as DeepSeek-R1, offering a cost-effective alternative to reinforcement learning. However, large-scale instruction sets with more than 100k samples incur significant training over… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.