Skip to main content

Showing 1–50 of 1,110 results for author: Jiao, X

.
  1. arXiv:2507.01735  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

    Authors: Kai Chen, Ruiyuan Gao, Lanqing Hong, Hang Xu, Xu Jia, Holger Caesar, Dengxin Dai, Bingbing Liu, Dzmitry Tsishkou, Songcen Xu, Chunjing Xu, Qiang Xu, Huchuan Lu, Dit-Yan Yeung

    Abstract: In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: ECCV 2024. Workshop page: https://coda-dataset.github.io/w-coda2024/

  2. arXiv:2507.01367  [pdf, ps, other

    cs.CV

    3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation

    Authors: Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao

    Abstract: Physical adversarial attack methods expose the vulnerabilities of deep neural networks and pose a significant threat to safety-critical scenarios such as autonomous driving. Camouflage-based physical attack is a more promising approach compared to the patch-based attack, offering stronger adversarial effectiveness in complex physical environments. However, most prior work relies on mesh priors of… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  3. arXiv:2506.22494  [pdf, ps, other

    cs.RO cs.CV cs.LG

    DriveBLIP2: Attention-Guided Explanation Generation for Complex Driving Scenarios

    Authors: Shihong Ling, Yue Wan, Xiaowei Jia, Na Du

    Abstract: This paper introduces a new framework, DriveBLIP2, built upon the BLIP2-OPT architecture, to generate accurate and contextually relevant explanations for emerging driving scenarios. While existing vision-language models perform well in general tasks, they encounter difficulties in understanding complex, multi-object environments, particularly in real-time applications such as autonomous driving, w… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025. 7 pages, 3 figures

  4. arXiv:2506.21618  [pdf, ps, other

    cs.CL cs.AI

    TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge

    Authors: Zhiyuan Zhang, Xiaosong Jia, Guanyu Chen, Qifeng Li, Junchi Yan

    Abstract: In this technical report, we introduce TrajTok, a trajectory tokenizer for discrete next-token-prediction based behavior generation models, which combines data-driven and rule-based methods with better coverage, symmetry and robustness, along with a spatial-aware label smoothing method for cross-entropy loss. We adopt the tokenizer and loss for the SMART model and reach a superior performance with… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.19266  [pdf

    q-bio.NC cs.CV eess.IV

    Convergent and divergent connectivity patterns of the arcuate fasciculus in macaques and humans

    Authors: Jiahao Huang, Ruifeng Li, Wenwen Yu, Anan Li, Xiangning Li, Mingchao Yan, Lei Xie, Qingrun Zeng, Xueyan Jia, Shuxin Wang, Ronghui Ju, Feng Chen, Qingming Luo, Hui Gong, Andrew Zalesky, Xiaoquan Yang, Yuanjing Feng, Zheng Wang

    Abstract: The organization and connectivity of the arcuate fasciculus (AF) in nonhuman primates remain contentious, especially concerning how its anatomy diverges from that of humans. Here, we combined cross-scale single-neuron tracing - using viral-based genetic labeling and fluorescence micro-optical sectioning tomography in macaques (n = 4; age 3 - 11 years) - with whole-brain tractography from 11.7T dif… ▽ More

    Submitted 2 July, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

    Comments: 34 pages, 6 figures

  6. arXiv:2506.17622  [pdf, ps, other

    cs.CR

    SoK: Stablecoin Designs, Risks, and the Stablecoin LEGO

    Authors: Shengchen Ling, Yuefeng Du, Yajin Zhou, Lei Wu, Cong Wang, Xiaohua Jia, Houmin Yan

    Abstract: Stablecoins have become significant assets in modern finance, with a market capitalization exceeding USD 246 billion (May 2025). Yet, despite their systemic importance, a comprehensive and risk-oriented understanding of crucial aspects like their design trade-offs, security dynamics, and interdependent failure pathways often remains underdeveloped. This SoK confronts this gap through a large-scale… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  7. arXiv:2506.17450  [pdf, ps, other

    cs.GR cs.CV

    BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

    Authors: Jiacheng Chen, Ramin Mehran, Xuhui Jia, Saining Xie, Sanghyun Woo

    Abstract: We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a gener… ▽ More

    Submitted 25 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: Project page: https://blenderfusion.github.io

  8. arXiv:2506.16576  [pdf, ps, other

    physics.chem-ph

    Accelerating Correlated Wave Function Calculations with Hierarchical Matrix Compression of the Two-Electron Integrals

    Authors: Hongji Gao, Xiangmin Jiao, Benjamin G. Levine

    Abstract: Leveraging matrix sparsity has proven a fruitful strategy for accelerating quantum chemical calculations. Here we present the hierarchical SOS-MP2 algorithm, which uses hierarchical matrix ($\mathcal{H}^{2}$) compression of the electron repulsion integral (ERI) tensor to reduce both time and space complexity. This approach is based on the atomic orbital Laplace transform MP2 calculations, leveragi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  9. SimSpark: Interactive Simulation of Social Media Behaviors

    Authors: Ziyue Lin, Yi Shan, Lin Gao, Xinghua Jia, Siming Chen

    Abstract: Understanding user behaviors on social media has garnered significant scholarly attention, enhancing our comprehension of how virtual platforms impact society and empowering decision-makers. Simulating social media behaviors provides a robust tool for capturing the patterns of social media behaviors, testing hypotheses, and predicting the effects of various interventions, ultimately contributing t… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 32 pages, 7 figures

    Journal ref: Proc. ACM Hum.-Comput. Interact. 9, 2, Article CSCW168 (April 2025), 32 pages

  10. arXiv:2506.12301  [pdf, ps, other

    cs.LG cs.AI

    Unveiling Confirmation Bias in Chain-of-Thought Reasoning

    Authors: Yue Wan, Xiaowei Jia, Xiang Lorraine Li

    Abstract: Chain-of-thought (CoT) prompting has been widely adopted to enhance the reasoning capabilities of large language models (LLMs). However, the effectiveness of CoT reasoning is inconsistent across tasks with different reasoning types. This work presents a novel perspective to understand CoT behavior through the lens of \textit{confirmation bias} in cognitive psychology. Specifically, we examine how… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Journal ref: ACL 2025 Findings

  11. arXiv:2506.09981  [pdf, ps, other

    cs.CV cs.RO

    ReSim: Reliable World Simulation for Autonomous Driving

    Authors: Jiazhi Yang, Kashyap Chitta, Shenyuan Gao, Long Chen, Yuqian Shao, Xiaosong Jia, Hongyang Li, Andreas Geiger, Xiangyu Yue, Li Chen

    Abstract: How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work,… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Project page: https://opendrivelab.com/ReSim

  12. arXiv:2506.08473  [pdf, ps, other

    cs.LG

    AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

    Authors: Shuo Yang, Qihui Zhang, Yuyang Liu, Yue Huang, Xiaojun Jia, Kunpeng Ning, Jiayu Yao, Jigang Wang, Hailiang Dai, Yibing Song, Li Yuan

    Abstract: Large language models (LLMs) are vulnerable to safety risks during fine-tuning, where small amounts of malicious or harmless data can compromise safeguards. In this paper, building on the concept of alignment direction -- defined by the weight difference between aligned and unaligned models -- we observe that perturbations along this direction preserve model safety. In contrast, perturbations alon… ▽ More

    Submitted 10 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.07672  [pdf, ps, other

    cs.AI

    MCPWorld: A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents

    Authors: Yunhe Yan, Shihe Wang, Jiajun Du, Yexuan Yang, Yuxuan Shan, Qichen Qiu, Xianqing Jia, Xinge Wang, Xin Yuan, Xu Han, Mao Qin, Yinxiao Chen, Chen Peng, Shangguang Wang, Mengwei Xu

    Abstract: (M)LLM-powered computer use agents (CUA) are emerging as a transformative technique to automate human-computer interaction. However, existing CUA benchmarks predominantly target GUI agents, whose evaluation methods are susceptible to UI changes and ignore function interactions exposed by application APIs, e.g., Model Context Protocol (MCP). To this end, we propose MCPWorld, the first automatic CUA… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  14. arXiv:2506.07454  [pdf, ps, other

    cs.RO cs.AI

    Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

    Authors: Jared Strader, Aaron Ray, Jacob Arkin, Mason B. Peterson, Yun Chang, Nathan Hughes, Christopher Bradley, Yi Xuan Jia, Carlos Nieto-Granda, Rajat Talak, Chuchu Fan, Luca Carlone, Jonathan P. How, Nicholas Roy

    Abstract: In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, vi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 figures

  15. arXiv:2506.06660  [pdf, ps, other

    stat.CO

    Efficient Mirror-type Kernels for the Metropolis-Hastings Algorithm

    Authors: Nuo Guan, Xiyun Jiao

    Abstract: We propose a new Metropolis-Hastings (MH) kernel by introducing the Mirror move into the Metropolis adjusted Langevin algorithm (MALA). This new kernel uses the strength of one kernel to overcome the shortcoming of the other, and generates proposals that are distant from the current position, but still within the high-density region of the target distribution. The resulting algorithm can be much m… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  16. arXiv:2506.06599  [pdf, ps, other

    cs.LG stat.ML

    Direct Prediction Set Minimization via Bilevel Conformal Classifier Training

    Authors: Yuanjie Shi, Hooman Shahrokhi, Xuesong Jia, Xiongzhi Chen, Janardhan Rao Doppa, Yan Yan

    Abstract: Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating con… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Accepted for Publication at International Conference on Machine Learning (ICML), 2025

  17. arXiv:2506.06072  [pdf, ps, other

    cs.RO cs.LG

    BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

    Authors: Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

    Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action seque… ▽ More

    Submitted 10 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  18. arXiv:2506.05401  [pdf, ps, other

    cs.CR cs.CV

    Robust Anti-Backdoor Instruction Tuning in LVLMs

    Authors: Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao

    Abstract: Large visual language models (LVLMs) have demonstrated excellent instruction-following capabilities, yet remain vulnerable to stealthy backdoor attacks when finetuned using contaminated data. Existing backdoor defense techniques are usually developed for single-modal visual or language models under fully parameter-adjustable settings or rely on supervisory knowledge during training. However, in re… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  19. arXiv:2506.05055  [pdf, ps, other

    hep-ex

    Study of $f_1(1420)$ and $η(1405)$ in the decay $J/ψ\to γπ^{0}π^{0}π^{0}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (650 additional authors not shown)

    Abstract: A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$,… ▽ More

    Submitted 7 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  20. arXiv:2506.02555  [pdf, other

    cs.CV

    SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence

    Authors: Zhitao Zeng, Zhu Zhuo, Xiaojun Jia, Erli Zhang, Junde Wu, Jiaan Zhang, Yuxuan Wang, Chang Han Low, Jian Jiang, Zilong Zheng, Xiaochun Cao, Yutong Ban, Qi Dou, Yang Liu, Yueming Jin

    Abstract: Foundation models have achieved transformative success across biomedical domains by enabling holistic understanding of multimodal data. However, their application in surgery remains underexplored. Surgical intelligence presents unique challenges - requiring surgical visual perception, temporal analysis, and reasoning. Existing general-purpose vision-language models fail to address these needs due… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 29 pages, 5 figures

    MSC Class: 68T45 ACM Class: I.2.10

  21. arXiv:2505.22013  [pdf, other

    cs.SD eess.AS

    Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge

    Authors: Shangkun Huang, Yuxuan Du, Jingwen Yang, Dejun Zhang, Xupeng Jia, Jing Deng, Jintao Kang, Rong Zheng

    Abstract: This paper presents the system developed to address the MISP 2025 Challenge. For the diarization system, we proposed a hybrid approach combining a WavLM end-to-end segmentation method with a traditional multi-module clustering technique to adaptively select the appropriate model for handling varying degrees of overlapping speech. For the automatic speech recognition (ASR) system, we proposed an AS… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  22. arXiv:2505.21773  [pdf, ps, other

    math.OC

    Assessing EV Charging Impacts on Power Distribution Systems: A Unified Co-Simulation Framework

    Authors: Mohammadreza Iranpour, Mohammad Rasoul Narimani, Xudong Jia

    Abstract: The growing adoption of electric vehicles (EVs) is expected to significantly increase demand on electric power distribution systems, many of which are already nearing capacity. To address this, the paper presents a comprehensive framework for analyzing the impact of large-scale EV integration on distribution networks. Using the open-source simulator OpenDSS, the framework builds detailed, scalable… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  23. arXiv:2505.21499  [pdf, ps, other

    cs.CR cs.AI

    AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

    Authors: Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, Qing Wang

    Abstract: Vision-Language Model (VLM) based Web Agents represent a significant step towards automating complex tasks by simulating human-like interaction with websites. However, their deployment in uncontrolled web environments introduces significant security vulnerabilities. Existing research on adversarial environmental injection attacks often relies on unrealistic assumptions, such as direct HTML manipul… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  24. arXiv:2505.21494  [pdf, ps, other

    cs.CV

    Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

    Authors: Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, Yang Liu

    Abstract: Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features-such as CLIP's [CLS] token-between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly f… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  25. arXiv:2505.20469  [pdf, other

    cs.CV cs.AI

    CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

    Authors: Lei Tian, Xiaomin Li, Liqian Ma, Hefei Huang, Zirui Zheng, Hao Yin, Taiqing Li, Huchuan Lu, Xu Jia

    Abstract: Recent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, methods that rely on 2D priors are prone to a critical challenge: cross-view semantic inconsistencies induced by occlusion, image blur, and view-dependent variations.… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  26. arXiv:2505.19139  [pdf, ps, other

    cs.CV

    The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework

    Authors: Feiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaojun Jia, Qingsong Wen, Wei Dong

    Abstract: Our research reveals a new privacy risk associated with the vision-language model (VLM) agentic framework: the ability to infer sensitive attributes (e.g., age and health information) and even abstract ones (e.g., personality and social traits) from a set of personal images, which we term "image private attribute profiling." This threat is particularly severe given that modern apps can easily acce… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  27. arXiv:2505.18954  [pdf, ps, other

    cs.AR

    Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity

    Authors: Cenlin Duan, Jianlei Yang, Yikun Wang, Yiou Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao

    Abstract: Processing-in-memory (PIM) is a transformative architectural paradigm designed to overcome the Von Neumann bottleneck. Among PIM architectures, digital SRAM-PIM emerges as a promising solution, offering significant advantages by directly integrating digital logic within the SRAM array. However, rigid crossbar architecture and full array activation pose challenges in efficiently utilizing tradition… ▽ More

    Submitted 12 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

    Comments: This paper is accepted by the Journal of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

  28. arXiv:2505.18652  [pdf, ps, other

    cs.CV

    Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU

    Authors: Yicheng Lin, Yunlong Jiang, Xujia Jiao, Bin Han

    Abstract: Robust long-term visual localization in complex industrial environments is critical for mobile robotic systems. Existing approaches face limitations: handcrafted features are illumination-sensitive, learned features are computationally intensive, and semantic- or marker-based methods are environmentally constrained. Handcrafted and learned features share similar representations but differ function… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 8 pages, 6 gifures

  29. arXiv:2505.18355  [pdf, ps, other

    cs.LG

    X-MethaneWet: A Cross-scale Global Wetland Methane Emission Benchmark Dataset for Advancing Science Discovery with AI

    Authors: Yiming Sun, Shuo Chen, Shengyu Chen, Chonghao Qiu, Licheng Liu, Youmi Oh, Sparkle L. Malone, Gavin McNicol, Qianlai Zhuang, Chris Smith, Yiqun Xie, Xiaowei Jia

    Abstract: Methane (CH$_4$) is the second most powerful greenhouse gas after carbon dioxide and plays a crucial role in climate change due to its high global warming potential. Accurately modeling CH$_4$ fluxes across the globe and at fine temporal scales is essential for understanding its spatial and temporal variability and developing effective mitigation strategies. In this work, we introduce the first-of… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 8 pages, 8 figures, 3 tables

  30. arXiv:2505.18004  [pdf, ps, other

    hep-ex

    Measurement of branching fractions of $Λ_{c}^{+}$ decays to $Σ^{+} η$ and $Σ^{+} η'$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: By analyzing $e^+e^-$ collision data taken at center-of-mass energies $\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  31. arXiv:2505.16394  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

    Authors: Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, Junchi Yan

    Abstract: Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in n… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  32. arXiv:2505.16278  [pdf, ps, other

    cs.CV cs.AI cs.RO

    DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

    Authors: Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, Junchi Yan

    Abstract: End-to-end autonomous driving (E2E-AD) demands effective processing of multi-view sensory data and robust handling of diverse and complex driving scenarios, particularly rare maneuvers such as aggressive turns. Recent success of Mixture-of-Experts (MoE) architecture in Large Language Models (LLMs) demonstrates that specialization of parameters enables strong scalability. In this work, we propose D… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Project Page: https://thinklab-sjtu.github.io/DriveMoE/

  33. arXiv:2505.16211  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

    Authors: Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhizheng Wu , et al. (6 additional authors not shown)

    Abstract: The rapid advancement and expanding applications of Audio Large Language Models (ALLMs) demand a rigorous understanding of their trustworthiness. However, systematic research on evaluating these models, particularly concerning risks unique to the audio modality, remains largely unexplored. Existing evaluation frameworks primarily focus on the text modality or address only a restricted set of safet… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Technical Report

  34. Test of local realism via entangled $Λ\barΛ$ system

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (597 additional authors not shown)

    Abstract: The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Journal ref: Nat Commun 16, 4948 (2025)

  35. arXiv:2505.14898  [pdf, ps, other

    cs.CR

    Topology-aware Detection and Localization of Distributed Denial-of-Service Attacks in Network-on-Chips

    Authors: Hansika Weerasena, Xiaoguo Jia, Prabhat Mishra

    Abstract: Network-on-Chip (NoC) enables on-chip communication between diverse cores in modern System-on-Chip (SoC) designs. With its shared communication fabric, NoC has become a focal point for various security threats, especially in heterogeneous and high-performance computing platforms. Among these attacks, Distributed Denial of Service (DDoS) attacks occur when multiple malicious entities collaborate to… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  36. arXiv:2505.14103  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

    Authors: Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang

    Abstract: Jailbreak attacks to Large audio-language models (LALMs) are studied recently, but they achieve suboptimal effectiveness, applicability, and practicability, particularly, assuming that the adversary can fully manipulate user prompts. In this work, we first conduct an extensive experiment showing that advanced text jailbreak attacks cannot be easily ported to end-to-end LALMs via text-to speech (TT… ▽ More

    Submitted 20 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  37. arXiv:2505.13794  [pdf, ps, other

    cs.AI

    LLM-based Evaluation Policy Extraction for Ecological Modeling

    Authors: Qi Cheng, Licheng Liu, Qing Zhu, Runlong Yu, Zhenong Jin, Yiqun Xie, Xiaowei Jia

    Abstract: Evaluating ecological time series is critical for benchmarking model performance in many important applications, including predicting greenhouse gas fluxes, capturing carbon-nitrogen dynamics, and monitoring hydrological cycles. Traditional numerical metrics (e.g., R-squared, root mean square error) have been widely used to quantify the similarity between modeled and observed ecosystem variables,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  38. arXiv:2505.13222  [pdf, ps, other

    hep-ex

    Partial Wave Analysis of $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$ and Cross Section Measurement of $e^{+}e^{-} \rightarrow π^{\pm}Z_{c}(3900)^{\mp}$ from 4.1271 to 4.3583 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$,… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  39. arXiv:2505.12082  [pdf, other

    cs.CL cs.LG

    Model Merging in Pre-training of Large Language Models

    Authors: Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Deyi Liu, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Xun Zhou, Siyuan Qiao, Liang Xiang , et al. (1 additional authors not shown)

    Abstract: Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to… ▽ More

    Submitted 22 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

  40. arXiv:2505.11548  [pdf, other

    cs.CR cs.AI

    One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems

    Authors: Zhiyuan Chang, Mingyang Li, Xiaojun Jia, Junjie Wang, Yuekai Huang, Ziyou Jiang, Yang Liu, Qing Wang

    Abstract: Large Language Models (LLMs) enhanced with Retrieval-Augmented Generation (RAG) have shown improved performance in generating accurate responses. However, the dependence on external knowledge bases introduces potential security vulnerabilities, particularly when these knowledge bases are publicly accessible and modifiable. While previous studies have exposed knowledge poisoning risks in RAG system… ▽ More

    Submitted 19 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

    Comments: 14pages, 4 figures

  41. arXiv:2505.07258  [pdf, ps, other

    cs.CL cs.AI

    No Query, No Access

    Authors: Wenqiang Wang, Siyuan Liang, Yangshijie Zhang, Xiaojun Jia, Hao Lin, Xiaochun Cao

    Abstract: Textual adversarial attacks mislead NLP models, including Large Language Models (LLMs), by subtly modifying text. While effective, existing attacks often require knowledge of the victim model, extensive queries, or access to training data, limiting real-world feasibility. To overcome these constraints, we introduce the \textbf{Victim Data-based Adversarial Attack (VDBA)}, which operates using only… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  42. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  43. arXiv:2505.06266  [pdf, other

    cs.LG cs.AI

    Knowledge Guided Encoder-Decoder Framework: Integrating Multiple Physical Models for Agricultural Ecosystem Modeling

    Authors: Qi Cheng, Licheng Liu, Yao Zhang, Mu Hong, Shiyuan Luo, Zhenong Jin, Yiqun Xie, Xiaowei Jia

    Abstract: Agricultural monitoring is critical for ensuring food security, maintaining sustainable farming practices, informing policies on mitigating food shortage, and managing greenhouse gas emissions. Traditional process-based physical models are often designed and implemented for specific situations, and their parameters could also be highly uncertain. In contrast, data-driven models often use black-box… ▽ More

    Submitted 12 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  44. arXiv:2505.05888  [pdf, ps, other

    hep-ex

    Measurement of the phase between strong and electromagnetic amplitudes in the decay $J/ψ\toφη$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (647 additional authors not shown)

    Abstract: The first direct measurement of the relative phase between the strong and electromagnetic amplitudes for a $J/ψ$ decaying into a vector-pseudoscalar final state is performed using 26 energy points of $e^+e^-$ annihilation data between $3.00\ \text{GeV}$ and \mbox{3.12 GeV}. The data sets were collected by the BESIII detector with a total integrated luminosity of 452 pb$^{-1}$. By investigating the… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  45. arXiv:2505.03180  [pdf, other

    hep-ex

    Observation of resonant contribution to the $e^+e^-\to Ω^{-}\barΩ^{+}$ around 4.2~GeV and evidence of $ψ(3770)\to Ω^{-}\barΩ^{+}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (625 additional authors not shown)

    Abstract: Using $e^+e^-$ collision data corresponding to a total integrated luminosity of 22.7 fb$^{-1}$, collected at center-of-mass energies between 3.7 and 4.7 GeV with the BESIII detector, we present a measurement of energy-dependent cross sections and effective form factors for the process of $e^+e^-\to Ω^{-}\barΩ^+$. By conducting a fit to the cross sections of $e^+e^-\to Ω^{-}\barΩ^+$ considering the… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 9 pages, 3 figures

  46. arXiv:2505.02862  [pdf, ps, other

    cs.CL cs.AI

    Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs

    Authors: Haoming Yang, Ke Ma, Xiaojun Jia, Yingfei Sun, Qianqian Xu, Qingming Huang

    Abstract: Despite the remarkable performance of Large Language Models (LLMs), they remain vulnerable to jailbreak attacks, which can compromise their safety mechanisms. Existing studies often rely on brute-force optimization or manual design, failing to uncover potential risks in real-world scenarios. To address this, we propose a novel jailbreak attack framework, ICRT, inspired by heuristics and biases in… ▽ More

    Submitted 27 June, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  47. arXiv:2505.02152  [pdf, other

    cs.RO

    Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

    Authors: Cunxin Fan, Xiaosong Jia, Yihang Sun, Yixiao Wang, Jianglan Wei, Ziyang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding

    Abstract: Vision-Language-Action (VLA) models have shown great promise for generalist robotic manipulation in the physical world. However, existing models are restricted to robot observations and text-only instructions, lacking the flexibility of interleaved multimodal instructions enabled by recent advances in foundation models in the digital world. In this paper, we present Interleave-VLA, the first frame… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  48. Multi-Scale Graph Learning for Anti-Sparse Downscaling

    Authors: Yingda Fan, Runlong Yu, Janet R. Barclay, Alison P. Appling, Yiming Sun, Yiqun Xie, Xiaowei Jia

    Abstract: Water temperature can vary substantially even across short distances within the same sub-watershed. Accurate prediction of stream water temperature at fine spatial resolutions (i.e., fine scales, $\leq$ 1 km) enables precise interventions to maintain water quality and protect aquatic habitats. Although spatiotemporal models have made substantial progress in spatially coarse time series modeling, c… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: AAAI-25, Multi-scale deep learning approach for spatial downscaling of geospatial data with sparse observations

    MSC Class: 68T05; 68U05 ACM Class: I.2.6; I.2.10

    Journal ref: AAAI-25, pages 27969-27977, 2025

  49. arXiv:2504.20570  [pdf, other

    cs.CR

    ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Jin Xie, Ruishi He, Songze Li, Xiaojun Jia, Shouling Ji

    Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a practical solution for adapting large language models (LLMs) to custom datasets with significantly reduced computational cost. When carrying out PEFT under collaborative learning scenarios (e.g., federated learning), it is often required to exchange model updates (or gradients) across parties. These gradients, even with limited dimensions, ca… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  50. arXiv:2504.20439  [pdf

    eess.SP

    A High-Resolution Transmission Line Model with De-embedding Structure for Ultralow Contact Resistivity Extraction

    Authors: Xuanyu Jia, Hongxu Liao, Ming Li

    Abstract: In this article, we present a contact resistivity extraction method calibrated using a de-embedding structure, called High-Resolution Transmission Line Model (HR-TLM). HR-TLM has the similar infrastructure with Refined TLM (RTLM) or Refined-Ladder TLM(R-LTLM), but is optimized for calibration methods. Its advantage lies in maintaining low \r{ho}_c extraction accuracy while significantly reducing t… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.