Skip to main content

Showing 151–200 of 7,791 results for author: Xu, Y

.
  1. arXiv:2506.04956  [pdf, ps, other

    cs.CV

    FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

    Authors: Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying nois… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: This paper has been early accepted by MICCAI 2025

  2. arXiv:2506.04941  [pdf, ps, other

    cs.RO

    ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning

    Authors: Zhao Jin, Zhengping Che, Zhen Zhao, Kun Wu, Yuheng Zhang, Yinuo Zhao, Zehui Liu, Qiang Zhang, Xiaozhu Ju, Jing Tian, Yousong Xue, Jian Tang

    Abstract: Robot learning increasingly relies on simulation to advance complex ability such as dexterous manipulations and precise interactions, necessitating high-quality digital assets to bridge the sim-to-real gap. However, existing open-source articulated-object datasets for simulation are limited by insufficient visual realism and low physical fidelity, which hinder their utility for training models mas… ▽ More

    Submitted 5 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2506.04924  [pdf, ps, other

    cs.LG

    Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion

    Authors: Han Wang, Ruoyun He, Guoguang Lao, Ting Liu, Hejiao Luo, Changqi Qin, Hongying Luo, Junmin Huang, Zihan Wei, Lu Chen, Yongzhi Xu, Ziqian Bi, Junhao Song, Tianyang Wang, Chia Xin Liang, Xinyuan Song, Huafeng Liu, Junfeng Hao, Chunjie Tian

    Abstract: Early identification of high-risk ICU patients is crucial for directing limited medical resources. We introduce ALFIA (Adaptive Layer Fusion with Intelligent Attention), a modular, attention-based architecture that jointly trains LoRA (Low-Rank Adaptation) adapters and an adaptive layer-weighting mechanism to fuse multi-layer semantic features from a BERT backbone. Trained on our rigorous cw-24 (C… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: 21 pages, 6 figures

  4. arXiv:2506.04850  [pdf, ps, other

    astro-ph.HE astro-ph.IM

    The Chinese Pulsar Timing Array data release I. Single pulsar noise analysis

    Authors: Siyuan Chen, Heng Xu, Yanjun Guo, Bojun Wang, R. Nicolas Caballero, Jinchen Jiang, Jiangwei Xu, Zihan Xue, Kejia Lee, Jianping Yuan, Yonghua Xu, Jingbo Wang, Longfei Hao, Jintao Luo, Jinlin Han, Peng Jiang, Zhiqiang Shen, Min Wang, Na Wang, Renxin Xu, Xiangping Wu, Lei Qian, Xin Guan, Menglin Huang, Chun Sun , et al. (1 additional authors not shown)

    Abstract: The Chinese Pulsar Timing Array (CPTA) has collected observations from 57 millisecond pulsars using the Five-hundred-meter Aperture Spherical Radio Telescope (FAST) for close to three years, for the purpose of searching for gravitational waves (GWs). To robustly search for ultra-low-frequency GWs, pulsar timing arrays (PTAs) need to use models to describe the noise from the individual pulsars. We… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 17 pages, 4 figures, 10 tables

  5. arXiv:2506.04660  [pdf

    cs.CE cond-mat.mtrl-sci

    Adaptive recycled plastic architecture: Vacuum-Sealed Chainmail Structures Through Computational Design

    Authors: Yi Xu, Farzin Lotfi-Jam, Mustafa Faruki

    Abstract: The construction industry is a major consumer of raw materials, accounting for nearly half of global material usage annually, while generating significant waste that poses sustainability challenges. This paper explores the untapped potential of recycled plastics as a primary construction material, leveraging their lightweight, flexible, and customizable properties for advanced applications in modu… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted manuscript. Published in International Journal of Architectural Computing, April 2025

    ACM Class: J.6; I.2.10

  6. arXiv:2506.04592  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification

    Authors: Chengwu Liu, Ye Yuan, Yichun Yin, Yan Xu, Xin Xu, Zaoyu Chen, Yasheng Wang, Lifeng Shang, Qun Liu, Ming Zhang

    Abstract: Chain-of-Thought (CoT) prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs). However, to mitigate hallucinations in CoT that are notoriously difficult to detect, current methods such as process reward models (PRMs) or self-consistency operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their eff… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted in ACL 2025

  7. arXiv:2506.03673  [pdf, ps, other

    cs.AI

    Reason from Future: Reverse Thought Chain Enhances LLM Reasoning

    Authors: Yinlong Xu, Yanzhao Zheng, Shuoshuo Sun, Shuaihan Huang, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu, Hongxia Xu, Jian Wu

    Abstract: It has been demonstrated that carefully designed reasoning paradigms, like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), can enhance the reasoning capabilities of small language models by detailed thinking and extensive thought searching, unbounded branching factors in the searching space create prohibitive reasoning consumption. However these methods fall into the trap of local optimum reason… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025 findings

  8. arXiv:2506.03509  [pdf, ps, other

    cond-mat.supr-con cond-mat.str-el

    Multiband superconductivity in the topological Kramers nodal-line semimetals

    Authors: Tian Shang, Jianzhou Zhao, Keqi Xia, Lun-Hui Hu, Yang Xu, Qingfeng Zhan, Dariusz Jakub Gawryluk, Toni Shiroka

    Abstract: Recent band-structure calculations predict that the ruthenium-based ternary silicides are three-dimensional Kramers nodal line semimetals. Among them, NbRuSi and TaRuSi show bulk superconductivity (SC) below $T_c \sim 3$ K and 4 K, as well as spontaneous magnetic fields. The latter indicates the breaking of time-reversal symmetry and, thus, unconventional SC in both compounds. Previous temperature… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 9 pages, 8 figures

    Journal ref: Phys. Rev. B 111, 214516 (2025)

  9. arXiv:2506.03474  [pdf, ps, other

    cs.LG cs.AI cs.AR

    CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design

    Authors: Yifeng Xiao, Yurong Xu, Ning Yan, Masood Mortazavi, Pierluigi Nuzzo

    Abstract: Simulation-based design space exploration (DSE) aims to efficiently optimize high-dimensional structured designs under complex constraints and expensive evaluation costs. Existing approaches, including heuristic and multi-step reinforcement learning (RL) methods, struggle to balance sampling efficiency and constraint satisfaction due to sparse, delayed feedback, and large hybrid action spaces. In… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Preprint. 10 pages + appendix. Submitted to NeurIPS 2025

    ACM Class: I.2.6; C.3

  10. arXiv:2506.03450  [pdf, ps, other

    cs.NE

    SENMAP: Multi-objective data-flow mapping and synthesis for hybrid scalable neuromorphic systems

    Authors: Prithvish V Nembhani, Oliver Rhodes, Guangzhi Tang, Alexandra F Dobrita, Yingfu Xu, Kanishkan Vadivel, Kevin Shidqi, Paul Detterer, Mario Konijnenburg, Gert-Jan van Schaik, Manolis Sifalakis, Zaid Al-Ars, Amirreza Yousefzadeh

    Abstract: This paper introduces SENMap, a mapping and synthesis tool for scalable, energy-efficient neuromorphic computing architecture frameworks. SENECA is a flexible architectural design optimized for executing edge AI SNN/ANN inference applications efficiently. To speed up the silicon tape-out and chip design for SENECA, an accurate emulator, SENSIM, was designed. While SENSIM supports direct mapping of… ▽ More

    Submitted 16 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: IJCNN conference, Italy, 2025, accepted, 30 June - 5 July

  11. arXiv:2506.03408  [pdf, ps, other

    cs.CL cs.CV

    Trajectory Prediction Meets Large Language Models: A Survey

    Authors: Yi Xu, Ruining Yang, Yitian Zhang, Yizhou Wang, Jianglin Lu, Mingyuan Zhang, Lili Su, Yun Fu

    Abstract: Recent advances in large language models (LLMs) have sparked growing interest in integrating language-driven techniques into trajectory prediction. By leveraging their semantic and reasoning capabilities, LLMs are reshaping how autonomous systems perceive, model, and predict trajectories. This survey provides a comprehensive overview of this emerging field, categorizing recent work into five direc… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 16 pages, GitHub: https://github.com/colorfulfuture/Awesome-Trajectory-Motion-Prediction-Papers

  12. arXiv:2506.02969  [pdf, ps, other

    hep-ex

    Measurement of the branching fractions of the Cabibbo-favored decays $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ and $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ and search for $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (660 additional authors not shown)

    Abstract: Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  13. arXiv:2506.02521  [pdf, ps, other

    hep-ex

    Improved Measurements of $D^+ \to ηe^+ν_e$ and $D^+ \to ημ^+ν_μ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (682 additional authors not shown)

    Abstract: Using 20.3 fb$^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we measure the branching fractions of $D^+\to ηe^+ν_e$ and $D^+\to ημ^+ν_μ$ to be $(9.75\pm0.29\pm0.28)\times10^{-4}$ and $(9.08\pm0.35\pm0.23)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. From a simultaneous fit to t… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  14. arXiv:2506.02498  [pdf, ps, other

    cond-mat.str-el

    Electronic structures and magnetism in van der Waals flat-band material Ni$_{3}$GeTe$_{2}$

    Authors: Yuanji Xu, Xintao Jin, Haoyuan Tang, Fuyang Tian

    Abstract: The study of magnetism in two-dimensional materials has garnered significant interest, driven by fundamental investigations into low-dimensional magnetic phenomena and their potential for applications in spintronic devices. Through dynamical mean-field theory calculations, we demonstrate that Ni$_{3}$GeTe$_{2}$ exhibits flat-band characteristics resulting from the geometric frustration of its laye… ▽ More

    Submitted 21 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

  15. arXiv:2506.02020  [pdf, other

    cs.CV cs.LG

    Improve Multi-Modal Embedding Learning via Explicit Hard Negative Gradient Amplifying

    Authors: Youze Xue, Dian Li, Gang Liu

    Abstract: With the rapid advancement of multi-modal large language models (MLLMs) in recent years, the foundational Contrastive Language-Image Pretraining (CLIP) framework has been successfully extended to MLLMs, enabling more powerful and universal multi-modal embeddings for a wide range of retrieval tasks. Despite these developments, the core contrastive learning paradigm remains largely unchanged from CL… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

  16. arXiv:2506.01968  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Efficient ANN-SNN Conversion with Error Compensation Learning

    Authors: Chang Liu, Jiangrong Shen, Xuming Ran, Mingkun Xu, Qi Xu, Yi Xu, Gang Pan

    Abstract: Artificial neural networks (ANNs) have demonstrated outstanding performance in numerous tasks, but deployment in resource-constrained environments remains a challenge due to their high computational and memory requirements. Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency, providing a bio-inspired alternative. However, current ANN-to-SNN con… ▽ More

    Submitted 12 May, 2025; originally announced June 2025.

  17. arXiv:2506.01829  [pdf, ps, other

    cs.CL cs.AI cs.IR

    CiteEval: Principle-Driven Citation Evaluation for Source Attribution

    Authors: Yumo Xu, Peng Qi, Jifan Chen, Kunlun Liu, Rujun Han, Lan Liu, Bonan Min, Vittorio Castelli, Arshit Gupta, Zhiguo Wang

    Abstract: Citation quality is crucial in information-seeking systems, directly influencing trust and the effectiveness of information access. Current evaluation frameworks, both human and automatic, mainly rely on Natural Language Inference (NLI) to assess binary or ternary supportiveness from cited sources, which we argue is a suboptimal proxy for citation evaluation. In this work we introduce CiteEval, a… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: ACL 2025

  18. arXiv:2506.01725  [pdf, ps, other

    cs.CV

    VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking

    Authors: Desen Meng, Rui Huang, Zhilin Dai, Xinhao Li, Yifan Xu, Jun Zhang, Zhenpeng Huang, Meng Zhang, Lingshu Zhang, Yi Liu, Limin Wang

    Abstract: While recent advances in reinforcement learning have significantly enhanced reasoning capabilities in large language models (LLMs), these techniques remain underexplored in multi-modal LLMs for video captioning. This paper presents the first systematic investigation of GRPO-based RL post-training for video MLLMs, with the goal of enhancing video MLLMs' capability of describing actions in videos. S… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  19. arXiv:2506.01293  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation

    Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Min Zhang, Wen Zhang, Huajun Chen

    Abstract: Multi-modal large language models (MLLMs) incorporate heterogeneous modalities into LLMs, enabling a comprehensive understanding of diverse scenarios and objects. Despite the proliferation of evaluation benchmarks and leaderboards for MLLMs, they predominantly overlook the critical capacity of MLLMs to comprehend world knowledge with structured abstractions that appear in visual form. To address t… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Work in progress

  20. arXiv:2506.00688  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Existing Large Language Model Unlearning Evaluations Are Inconclusive

    Authors: Zhili Feng, Yixuan Even Xu, Alexander Robey, Robert Kirk, Xander Davies, Yarin Gal, Avi Schwarzschild, J. Zico Kolter

    Abstract: Machine unlearning aims to remove sensitive or undesired data from large language models. However, recent studies suggest that unlearning is often shallow, claiming that removed knowledge can easily be recovered. In this work, we critically examine standard unlearning evaluation practices and uncover key limitations that shake our trust in those findings. First, we show that some evaluations intro… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  21. arXiv:2506.00677  [pdf, ps, other

    cs.CR cs.ET physics.app-ph

    Review of Blockchain-Based Approaches to Spent Fuel Management in Nuclear Power Plants

    Authors: Yuxiang Xu, Wenjuan Yu, Yuqian Wan, Zhongming Zhang

    Abstract: This study addresses critical challenges in managing the transportation of spent nuclear fuel, including inadequate data transparency, stringent confidentiality requirements, and a lack of trust among collaborating parties, issues prevalent in traditional centralized management systems. Given the high risks involved, balancing data confidentiality with regulatory transparency is imperative. To ove… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  22. arXiv:2506.00569  [pdf, ps, other

    cs.LG

    AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

    Authors: Nicholas E. Corrado, Julian Katz-Samuels, Adithya Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, Trishul Chilimbi

    Abstract: When aligning large language models (LLMs), their performance on various tasks (such as being helpful, harmless, and honest) depends heavily on the composition of their training data. However, selecting a data mixture that achieves strong performance across all tasks is challenging. Existing approaches rely on large ablation studies, heuristics, or human intuition, but these can be prohibitively e… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: ACL 2025, Main Conference

  23. arXiv:2506.00225  [pdf, other

    cs.RO cs.CV

    Understanding while Exploring: Semantics-driven Active Mapping

    Authors: Liyan Chen, Huangying Zhan, Hairong Yin, Yi Xu, Philippos Mordohai

    Abstract: Effective robotic autonomy in unknown environments demands proactive exploration and precise understanding of both geometry and semantics. In this paper, we propose ActiveSGM, an active semantic mapping framework designed to predict the informativeness of potential observations before execution. Built upon a 3D Gaussian Splatting (3DGS) mapping backbone, our approach employs semantic and geometric… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  24. arXiv:2505.24823  [pdf, other

    cs.LG cs.AI cs.CL

    PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models

    Authors: Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo

    Abstract: Large language models (LLMs) have rapidly advanced and are increasingly capable of tackling complex scientific problems, including those in physics. Despite this progress, current LLMs often fail to emulate the concise, principle-based reasoning characteristic of human experts, instead generating lengthy and opaque solutions. This discrepancy highlights a crucial gap in their ability to apply core… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  25. arXiv:2505.24466  [pdf, ps, other

    cs.CV

    SA-Person: Text-Based Person Retrieval with Scene-aware Re-ranking

    Authors: Yingjia Xu, Jinlin Wu, Zhen Chen, Daming Gao, Yang Yang, Zhen Lei, Min Cao

    Abstract: Text-based person retrieval aims to identify a target individual from a gallery of images based on a natural language description. It presents a significant challenge due to the complexity of real-world scenes and the ambiguity of appearance-related descriptions. Existing methods primarily emphasize appearance-based cross-modal retrieval, often neglecting the contextual information embedded within… ▽ More

    Submitted 26 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 22 pages, 7 figures. Under review

  26. arXiv:2505.24307  [pdf, ps, other

    cs.IT eess.SP

    Multi-Waveguide Pinching Antennas for ISAC

    Authors: Weihao Mao, Yang Lu, Yanqing Xu, Bo Ai, Octavia A. Dobre, Dusit Niyato

    Abstract: Recently, a novel flexible-antenna technology, called pinching antennas, has attracted growing academic interest. By inserting discrete dielectric materials, pinching antennas can be activated at arbitrary points along waveguides, allowing for flexible customization of large-scale path loss. This paper investigates a multi-waveguide pinching-antenna integrated sensing and communications (ISAC) sys… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  27. arXiv:2505.24221  [pdf, ps, other

    cs.DB

    FOCUS: Boosting Schema-aware Access for KV Stores via Hierarchical Data Management

    Authors: Zhen Liu, Wenzhe Zhu, Yongkun Li, Yinlong Xu

    Abstract: Persistent key-value (KV) stores are critical infrastructure for data-intensive applications. Leveraging high-performance Non-Volatile Memory (NVM) to enhance KV stores has gained traction. However, previous work has primarily focused on optimizing KV stores themselves, without adequately addressing their integration into applications. Consequently, existing applications, represented by NewSQL dat… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  28. arXiv:2505.24175  [pdf, ps, other

    astro-ph.IM astro-ph.GA

    Photometric redshift estimation for emission line galaxies of DESI Legacy Imaging Surveys by CNN-MLP

    Authors: Shirui Wei, Changhua Li, Yanxia Zhang, Chenzhou Cui, Chao Tang, Jingyi Zhang, Yongheng Zhao, Xuebing Wu, Yihan Tao, Dongwei Fan, Shanshan Li, Yunfei Xu, Maoyuan Huang, Xingyu Yang, Zihan Kang, Jinghang Shi

    Abstract: Emission Line Galaxies (ELGs) are crucial for cosmological studies, particularly in understanding the large-scale structure of the Universe and the role of dark energy. ELGs form an essential component of the target catalogue for the Dark Energy Spectroscopic Instrument (DESI), a major astronomical survey. However, the accurate selection of ELGs for such surveys is challenging due to the inherent… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 15 pages, 10 figures, 8 tables, accepted for publication in PASA

  29. arXiv:2505.24173  [pdf, ps, other

    cs.CV

    DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?

    Authors: Tianhong Zhou, Yin Xu, Yingtao Zhu, Chuxi Xiao, Haiyang Bian, Lei Wei, Xuegong Zhang

    Abstract: Vision-language models (VLMs) exhibit strong zero-shot generalization on natural images and show early promise in interpretable medical image analysis. However, existing benchmarks do not systematically evaluate whether these models truly reason like human clinicians or merely imitate superficial patterns. To address this gap, we propose DrVD-Bench, the first multimodal benchmark for clinical visu… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  30. arXiv:2505.23871  [pdf, ps, other

    cs.LG cs.AI

    ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

    Authors: Zeyuan Liu, Zhihe Yang, Jiawei Xu, Rui Yang, Jiafei Lyu, Baoxiang Wang, Yunjian Xu, Xiu Li

    Abstract: Real-world datasets collected from sensors or human inputs are prone to noise and errors, posing significant challenges for applying offline reinforcement learning (RL). While existing methods have made progress in addressing corrupted actions and rewards, they remain insufficient for handling corruption in high-dimensional state spaces and for cases where multiple elements in the dataset are corr… ▽ More

    Submitted 4 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  31. arXiv:2505.23866  [pdf, ps, other

    cs.LG cs.AI

    Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization

    Authors: Chengli Tan, Yubo Zhou, Haishan Ye, Guang Dai, Junmin Liu, Zengjie Song, Jiangshe Zhang, Zixiang Zhao, Yunda Hao, Yong Xu

    Abstract: Deep neural networks have been increasingly used in safety-critical applications such as medical diagnosis and autonomous driving. However, many studies suggest that they are prone to being poorly calibrated and have a propensity for overconfidence, which may have disastrous consequences. In this paper, unlike standard training such as stochastic gradient descent, we show that the recently propose… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 16 pages

  32. arXiv:2505.23826  [pdf, ps, other

    cs.SI

    FinRipple: Aligning Large Language Models with Financial Market for Event Ripple Effect Awareness

    Authors: Yuanjian Xu, Jianing Hao, Kunsheng Tang, Jingnan Chen, Anxian Liu, Peng Liu, Guang Zhang

    Abstract: Financial markets exhibit complex dynamics where localized events trigger ripple effects across entities. Previous event studies, constrained by static single-company analyses and simplistic assumptions, fail to capture these ripple effects. While large language models (LLMs) offer emergent reasoning capabilities, their direct application falters due to structural market unawareness and limited ca… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  33. arXiv:2505.23803  [pdf, ps, other

    cs.CR cs.AI

    MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection

    Authors: Yinuo Xue, Eric Spero, Yun Sing Koh, Giovanni Russello

    Abstract: Phishing email detection faces critical challenges from evolving adversarial tactics and heterogeneous attack patterns. Traditional detection methods, such as rule-based filters and denylists, often struggle to keep pace with these evolving tactics, leading to false negatives and compromised security. While machine learning approaches have improved detection accuracy, they still face challenges ad… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  34. arXiv:2505.23561  [pdf, ps, other

    cs.CR

    Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models

    Authors: Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, Lichao Sun

    Abstract: Model merging for Large Language Models (LLMs) directly fuses the parameters of different models finetuned on various tasks, creating a unified model for multi-domain tasks. However, due to potential vulnerabilities in models available on open-source platforms, model merging is susceptible to backdoor attacks. In this paper, we propose Merge Hijacking, the first backdoor attack targeting model mer… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: This paper is accepted by ACL 2025 main conference

  35. arXiv:2505.23466  [pdf, ps, other

    cond-mat.str-el

    Importance of pressure-dependent electronic interactions and magnetic order on pressure-driven insulator-metal transitions in MnO and NiO

    Authors: Bei-Lei Liu, Yue-Chao Wang, Yuan-Ji Xu, Xingyu Gao, Hai-Feng Liu, Hai-Feng Song

    Abstract: The pressure-driven insulator-metal transition is a crucial topic in condensed matter physics. However, even for the prototypical strongly correlated system, NiO, the critical pressure for transition remains debated. In this work, we evaluated the electronic interactions over a wide range of pressures based on our developed doubly-screened Coulomb correction method and investigated the effects of… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 10 pages, 9 figures

  36. arXiv:2505.23227  [pdf, ps, other

    cond-mat.soft

    Polymer-modulated evaporation flow enables scalable self-assembly of highly aligned nanowires

    Authors: Liyiming Tao, Zechao Jiang, Shiyuan Hu, Lin Du, Qiuting Zhang, Jiajia Zhou, Masao Doi, Xiaojun Wu, Xingkun Man, Ye Xu

    Abstract: Highly aligned nanowire networks are essential for enabling anisotropic optical, electrical, and sensing functionalities in next-generation devices. However, achieving such alignment typically requires complex fabrication methods or high-energy processing. Here, we present a simple and scalable self-assembly strategy that uses a viscosity-enhancing polymer additive to modulate fluid flows during s… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  37. arXiv:2505.23038  [pdf, ps, other

    cs.CL

    EL4NER: Ensemble Learning for Named Entity Recognition via Multiple Small-Parameter Large Language Models

    Authors: Yuzhen Xiao, Jiahe Song, Yongxin Xu, Ruizhe Zhang, Yiqi Xiao, Xin Lu, Runchuan Zhu, Bowen Jiang, Junfeng Zhao

    Abstract: In-Context Learning (ICL) technique based on Large Language Models (LLMs) has gained prominence in Named Entity Recognition (NER) tasks for its lower computing resource consumption, less manual labeling overhead, and stronger generalizability. Nevertheless, most ICL-based NER methods depend on large-parameter LLMs: the open-source models demand substantial computational resources for deployment an… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  38. arXiv:2505.22999  [pdf, ps, other

    cs.GT

    Online Selection with Uncertain Disruption

    Authors: Yihua Xu, Süleyman Kerimov, Sebastian Perez-Salazar

    Abstract: In numerous online selection problems, decision-makers (DMs) must allocate on the fly limited resources to customers with uncertain values. The DM faces the tension between allocating resources to currently observed values and saving them for potentially better, unobserved values in the future. Addressing this tension becomes more demanding if an uncertain disruption occurs while serving customers… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  39. arXiv:2505.22633  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Spatial Knowledge Graph-Guided Multimodal Synthesis

    Authors: Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang

    Abstract: Recent advances in multimodal large language models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To address this challenge, multimodal data synthesis offers a promising solution. Yet, ensuring that synthesized data adhere to spatial common sense is a non-trivial task. In this work, we introduce SKG2Data, a novel mu… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Ongoing work

  40. arXiv:2505.22290  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling

    Authors: Fanzeng Xia, Yidong Luo, Tinko Sebastian Bartels, Yaqi Xu, Tongxin Li

    Abstract: Recent research has highlighted that Large Language Models (LLMs), even when trained to generate extended long reasoning steps, still face significant challenges on hard reasoning problems. However, much of the existing literature relies on direct prompting with simple in-context learning examples for evaluation, which largely overlooks advanced techniques to elicit LLMs' deliberate reasoning befo… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  41. arXiv:2505.22167  [pdf, other

    cs.CV

    Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers

    Authors: Weilun Feng, Chuanguang Yang, Haotong Qin, Xiangqi Li, Yu Wang, Zhulin An, Libo Huang, Boyu Diao, Zixiang Zhao, Yongjun Xu, Michele Magno

    Abstract: Diffusion transformers (DiT) have demonstrated exceptional performance in video generation. However, their large number of parameters and high computational complexity limit their deployment on edge devices. Quantization can reduce storage requirements and accelerate inference by lowering the bit-width of model parameters. Yet, existing quantization methods for image generation models do not gener… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML2025

  42. arXiv:2505.22140  [pdf, other

    hep-ex

    Search for a dark baryon in the $Ξ^-\rightarrowπ^-+{\rm invisible}$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

    Abstract: A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures, 1 table

  43. arXiv:2505.21906  [pdf, ps, other

    cs.RO cs.AI cs.CV

    ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

    Authors: Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

    Abstract: Vision-language-action (VLA) models have emerged as the next generation of models in robotics. However, despite leveraging powerful pre-trained Vision-Language Models (VLMs), existing end-to-end VLA systems often lose key capabilities during fine-tuning as the model adapts to specific robotic tasks. We argue that a generalizable VLA model should retain and expand upon the VLM's core competencies:… ▽ More

    Submitted 29 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Project page: https://chatvla-2.github.io/

  44. arXiv:2505.21822  [pdf, ps, other

    physics.optics eess.IV

    Compressive Fourier-Domain Intensity Coupling (C-FOCUS) enables near-millimeter deep imaging in the intact mouse brain in vivo

    Authors: Renzhi He, Yucheng Li, Brianna Urbina, Jiandi Wan, Yi Xue

    Abstract: Two-photon microscopy is a powerful tool for in vivo imaging, but its imaging depth is typically limited to a few hundred microns due to tissue scattering, even with existing scattering correction techniques. Moreover, most active scattering correction methods are restricted to small regions by the optical memory effect. Here, we introduce compressive Fourier-domain intensity coupling for scatteri… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  45. arXiv:2505.21568  [pdf, ps, other

    cs.SD cs.AI cs.CR eess.AS

    VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents

    Authors: Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng

    Abstract: Voice cloning (VC)-resistant watermarking is an emerging technique for tracing and preventing unauthorized cloning. Existing methods effectively trace traditional VC models by training them on watermarked audio but fail in zero-shot VC scenarios, where models synthesize audio from an audio prompt without training. To address this, we propose VoiceMark, the first zero-shot VC-resistant watermarking… ▽ More

    Submitted 30 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  46. arXiv:2505.21527  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining

    Authors: Jianheng Zhuo, Yifan Yang, Yiwen Shao, Yong Xu, Dong Yu, Kai Yu, Xie Chen

    Abstract: Automatic speech recognition (ASR) has made remarkable progress but heavily relies on large-scale labeled data, which is scarce for low-resource languages like Vietnamese. While existing systems such as Whisper, USM, and MMS achieve promising performance, their efficacy remains inadequate in terms of training costs, latency, and accessibility. To address these issues, we propose VietASR, a novel A… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  47. arXiv:2505.21070  [pdf, ps, other

    cs.CV

    Minute-Long Videos with Dual Parallelisms

    Authors: Zeqing Wang, Bowen Zheng, Xingyi Yang, Zhenxiong Tan, Yuecong Xu, Xinchao Wang

    Abstract: Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy, termed DualParal. The core idea is that, instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs.… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: The code is available at https://github.com/DualParal-Project/DualParal

  48. arXiv:2505.21050  [pdf, other

    cs.CV

    Advancing high-fidelity 3D and Texture Generation with 2.5D latents

    Authors: Xin Yang, Jiantao Lin, Yingjie Xu, Haodong Li, Yingcong Chen

    Abstract: Despite the availability of large-scale 3D datasets and advancements in 3D generative models, the complexity and uneven quality of 3D geometry and texture data continue to hinder the performance of 3D generation techniques. In most existing approaches, 3D geometry and texture are generated in separate stages using different models and non-unified representations, frequently leading to unsatisfacto… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  49. arXiv:2505.21049  [pdf, ps, other

    cs.CV

    Robust Video-Based Pothole Detection and Area Estimation for Intelligent Vehicles with Depth Map and Kalman Smoothing

    Authors: Dehao Wang, Haohang Zhu, Yiwen Xu, Kaiqi Liu

    Abstract: Road potholes pose a serious threat to driving safety and comfort, making their detection and assessment a critical task in fields such as autonomous driving. When driving vehicles, the operators usually avoid large potholes and approach smaller ones at reduced speeds to ensure safety. Therefore, accurately estimating pothole area is of vital importance. Most existing vision-based methods rely on… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  50. arXiv:2505.20747  [pdf, ps, other

    eess.SY

    On Kernel Design for Regularized Volterra Series Identification of Wiener-Hammerstein Systems

    Authors: Yu Xu, Biqiang Mu, Tianshi Chen

    Abstract: There have been increasing interests on the Volterra series identification with the kernel-based regularization method. The major difficulties are on the kernel design and efficiency of the corresponding implementation. In this paper, we first assume that the underlying system to be identified is the Wiener-Hammerstein (WH) system with polynomial nonlinearity. We then show how to design kernels wi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 17 pages, 7 figures