Skip to main content

Showing 201–250 of 9,119 results for author: Wang, C

.
  1. arXiv:2506.01204  [pdf, ps, other

    quant-ph cond-mat.str-el physics.comp-ph

    Quantum-Classical Embedding via Ghost Gutzwiller Approximation for Enhanced Simulations of Correlated Electron Systems

    Authors: I-Chi Chen, Aleksei Khindanov, Carlos Salazar, Humberto Munoz Barona, Feng Zhang, Cai-Zhuang Wang, Thomas Iadecola, Nicola Lanatà, Yong-Xin Yao

    Abstract: Simulating correlated materials on present-day quantum hardware remains challenging due to limited quantum resources. Quantum embedding methods offer a promising route by reducing computational complexity through the mapping of bulk systems onto effective impurity models, allowing more feasible simulations on pre- and early-fault-tolerant quantum devices. This work develops a quantum-classical emb… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 15 pages, 5 figures

  2. arXiv:2506.01015  [pdf, ps, other

    cs.CV

    AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting

    Authors: Yuyuan Liu, Yuanhong Chen, Chong Wang, Junlin Han, Junde Wu, Can Peng, Jingkun Chen, Yu Tian, Gustavo Carneiro

    Abstract: Segment Anything Model 2 (SAM2) exhibits strong generalisation for promptable segmentation in video clips; however, its integration with the audio modality remains underexplored. Existing approaches mainly follow two directions: (1) injecting adapters into the image encoder to receive audio signals, which incurs efficiency costs during prompt engineering, and (2) leveraging additional foundation m… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 18 pages, 18 Figures and 7 tables

  3. arXiv:2506.00907  [pdf, ps, other

    physics.comp-ph physics.flu-dyn

    Lattice Boltzmann Boundary Conditions for Flow, Convection-Diffusion and MHD Simulations

    Authors: Jun Li, Wai Hong Ronald Chan, Zhe Feng, Chenglei Wang

    Abstract: A general derivation is proposed for several boundary conditions arisen in the lattice Boltzmann simulations of various physical problems. Pair-wise moment-conservations are proposed to enforce the boundary conditions with given macroscopic quantities, including the velocity and pressure boundary conditions in flow simulations, a given concentration in convection-diffusion (CD) simulations, as wel… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 1. General derivation of LBM schemes for Dirichlet, Neumann and Robin-like boundaries; 2. Simple interpolation or extrapolation scheme for arbitrary boundary-to-grid distances; 3. Boundary schemes compatible in fully coupled simulations of multi-physics; 4. Boundary schemes valid for both static and moving boundaries

  4. arXiv:2506.00828  [pdf, ps, other

    cs.IR cs.LG

    Breaker: Removing Shortcut Cues with User Clustering for Single-slot Recommendation System

    Authors: Chao Wang, Yue Zheng, Yujing Zhang, Yan Feng, Zhe Wang, Xiaowei Shi, An You, Yu Chen

    Abstract: In a single-slot recommendation system, users are only exposed to one item at a time, and the system cannot collect user feedback on multiple items simultaneously. Therefore, only pointwise modeling solutions can be adopted, focusing solely on modeling the likelihood of clicks or conversions for items by users to learn user-item preferences, without the ability to capture the ranking information a… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  5. arXiv:2506.00736  [pdf, ps, other

    eess.AS cs.SD

    IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling

    Authors: Kuan-Po Huang, Shu-wen Yang, Huy Phan, Bo-Ru Lu, Byeonggeun Kim, Sashank Macha, Qingming Tang, Shalini Ghosh, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

    Abstract: Text-to-audio generation synthesizes realistic sounds or music given a natural language prompt. Diffusion-based frameworks, including the Tango and the AudioLDM series, represent the state-of-the-art in text-to-audio generation. Despite achieving high audio fidelity, they incur significant inference latency due to the slow diffusion sampling process. MAGNET, a mask-based model operating on discret… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025. Project website: https://audio-impact.github.io/

  6. arXiv:2506.00714  [pdf, ps, other

    cs.SE cs.AI

    An LLM Agent for Functional Bug Detection in Network Protocols

    Authors: Mingwei Zheng, Chengpeng Wang, Xuwei Liu, Jinyao Guo, Shiwei Feng, Xiangyu Zhang

    Abstract: Functional correctness is critical for ensuring the reliability and security of network protocol implementations. Functional bugs, instances where implementations diverge from behaviors specified in RFC documents, can lead to severe consequences, including faulty routing, authentication bypasses, and service disruptions. Detecting these bugs requires deep semantic analysis across specification doc… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  7. arXiv:2506.00536  [pdf, ps, other

    cs.CL cs.AI

    Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing

    Authors: Changyue Wang, Weihang Su, Qingyao Ai, Yujia Zhou, Yiqun Liu

    Abstract: Knowledge editing aims to efficiently update Large Language Models (LLMs) by modifying specific knowledge without retraining the entire model. Among knowledge editing approaches, in-context editing (ICE) offers a lightweight solution by injecting new knowledge directly into the input context, leaving model parameters unchanged. However, existing ICE approaches do not explicitly separate the newly… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2506.00271  [pdf, ps, other

    eess.IV

    Adaptive Voxelization for Transform coding of 3D Gaussian splatting data

    Authors: Chenjunjie Wang, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Cheng Chang

    Abstract: We present a novel compression framework for 3D Gaussian splatting (3DGS) data that leverages transform coding tools originally developed for point clouds. Contrary to existing 3DGS compression methods, our approach can produce compressed 3DGS models at multiple bitrates in a computationally efficient way. Point cloud voxelization is a discretization technique that point cloud codecs use to improv… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  9. arXiv:2506.00085  [pdf, ps, other

    cs.CL cs.AI

    COSMIC: Generalized Refusal Direction Identification in LLM Activations

    Authors: Vincent Siu, Nicholas Crispino, Zihao Yu, Sam Pan, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang

    Abstract: Large Language Models (LLMs) encode behaviors such as refusal within their activation space, yet identifying these behaviors remains a significant challenge. Existing methods often rely on predefined refusal templates detectable in output tokens or require manual analysis. We introduce \textbf{COSMIC} (Cosine Similarity Metrics for Inversion of Concepts), an automated framework for direction selec… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: 9 pages, Accepted to ACL 2025 Findings

  10. arXiv:2505.24804  [pdf, ps, other

    eess.SP cs.IT

    Coordinated Beamforming for RIS-Empowered ISAC Systems over Secure Low-Altitude Networks

    Authors: Chunjie Wang, Xuhui Zhang, Wenchao Liu, Jinke Ren, Huijun Xing, Shuqiang Wang, Yanyan Shen

    Abstract: Emerging as a cornerstone for next-generation wireless networks, integrated sensing and communication (ISAC) systems demand innovative solutions to balance spectral efficiency and sensing accuracy. In this paper, we propose a coordinated beamforming framework for a reconfigurable intelligent surface (RIS)-empowered ISAC system, where the active precoding at the dual-functional base station (DFBS)… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: This manuscript has been submitted to the IEEE

  11. arXiv:2505.24779  [pdf, ps, other

    cs.LG

    EVA-MILP: Towards Standardized Evaluation of MILP Instance Generation

    Authors: Yidong Luo, Chenguang Wang, Jiahao Yang, Fanzeng Xia, Tianshu Yu

    Abstract: Mixed-Integer Linear Programming (MILP) is fundamental to solving complex decision-making problems. The proliferation of MILP instance generation methods, driven by machine learning's demand for diverse optimization datasets and the limitations of static benchmarks, has significantly outpaced standardized evaluation techniques. Consequently, assessing the fidelity and utility of synthetic MILP ins… ▽ More

    Submitted 3 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: The code is available in \url{https://github.com/anonymous-neurips-submission-2025/EVA-MILP}

  12. arXiv:2505.24586  [pdf, ps, other

    astro-ph.HE

    All-sky search for individual Primordial Black Hole bursts with LHAASO

    Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen , et al. (293 additional authors not shown)

    Abstract: Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for… ▽ More

    Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 8 pages, 2 figures

  13. arXiv:2505.24351  [pdf, ps, other

    eess.IV cs.CV

    A Novel Coronary Artery Registration Method Based on Super-pixel Particle Swarm Optimization

    Authors: Peng Qi, Wenxi Qu, Tianliang Yao, Haonan Ma, Dylan Wintle, Yinyi Lai, Giorgos Papanastasiou, Chengjia Wang

    Abstract: Percutaneous Coronary Intervention (PCI) is a minimally invasive procedure that improves coronary blood flow and treats coronary artery disease. Although PCI typically requires 2D X-ray angiography (XRA) to guide catheter placement at real-time, computed tomography angiography (CTA) may substantially improve PCI by providing precise information of 3D vascular anatomy and status. To leverage real-t… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  14. arXiv:2505.23713  [pdf, ps, other

    cs.CL

    SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

    Authors: Zixiang Xu, Yanbo Wang, Yue Huang, Jiayi Ye, Haomin Zhuang, Zirui Song, Lang Gao, Chenxi Wang, Zhaorun Chen, Yujun Zhou, Sixian Li, Wang Pan, Yue Zhao, Jieyu Zhao, Xiangliang Zhang, Xiuying Chen

    Abstract: Large language models (LLMs) are increasingly applied to socially grounded tasks, such as online community moderation, media content analysis, and social reasoning games. Success in these contexts depends on a model's social reasoning ability - the capacity to interpret social contexts, infer others' mental states, and assess the truthfulness of presented information. However, there is currently n… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Code available at https://github.com/xzx34/SocialMaze

  15. arXiv:2505.23686  [pdf, ps, other

    cs.AI cs.MA

    ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork

    Authors: Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone

    Abstract: Developing AI agents capable of collaborating with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches typically adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time,… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    ACM Class: I.2.11; I.2.1; I.2.6; I.2.8

  16. arXiv:2505.23535  [pdf, ps, other

    stat.ME

    Robust Estimation of Double Autoregressive Models via Normal Mixture QMLE

    Authors: Zhao Chen, Chen Shi, Christina Dan Wang

    Abstract: This paper investigates the estimation of the double autoregressive (DAR) model in the presence of skewed and heavy-tailed innovations. We propose a novel Normal Mixture Quasi-Maximum Likelihood Estimation (NM-QMLE) method to address the limitations of conventional quasi-maximum likelihood estimation (QMLE) under non-Gaussian conditions. By incorporating a normal mixture distribution into the quas… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  17. arXiv:2505.23530  [pdf, ps, other

    hep-ex

    Measurement of the Lund plane for light- and beauty-quark jets

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1133 additional authors not shown)

    Abstract: The substructure of jets in quantum chromodynamics (QCD) has garnered significant attention with the advent of infrared- and collinear-safe clustering algorithms and observables. A key question emerging from these studies is how in-jet emissions at soft and hard energy scales, across collinear and wide angles relative to the emitter, differ with the mass of the emitting parton. The Lund jet plane… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2025-010.html (LHCb public pages)

    Report number: LHCb-PAPER-2025-010,CERN-EP-2025-093

  18. arXiv:2505.23471  [pdf, ps, other

    cs.SE

    Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency

    Authors: Jun Yang, Cheng-Chi Wang, Bogdan Alexandru Stoica, Kexin Pei

    Abstract: Large Language Models (LLMs) have been increasingly used to optimize code efficiency. Evaluating their effectiveness and further suggesting optimization opportunities often rely on high-quality tests to demonstrate the performance bottlenecks presented in the program. However, existing approaches rely on a limited set of hand-curated inputs or LLM-generated uninteresting length-stressing tests, fa… ▽ More

    Submitted 17 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: 30 pages, 3 figures

    ACM Class: D.2.5

  19. arXiv:2505.23381  [pdf, ps, other

    cs.AI

    AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning

    Authors: Bowen Ping, Minnan Luo, Zhuohang Dang, Chenxi Wang, Chengyou Jia

    Abstract: Geometry problem solving presents distinctive challenges in artificial intelligence, requiring exceptional multimodal comprehension and rigorous mathematical reasoning capabilities. Existing approaches typically fall into two categories: neural-based and symbolic-based methods, both of which exhibit limitations in reliability and interpretability. To address this challenge, we propose AutoGPS, a n… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  20. arXiv:2505.23214  [pdf, ps, other

    cs.CV cs.AI

    SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

    Authors: Wenhao Xu, Shuchen Zheng, Changwei Wang, Zherui Zhang, Chuan Ren, Rongtao Xu, Shibiao Xu

    Abstract: Infrared small target detection (ISTD) is vital for long-range surveillance in military, maritime, and early warning applications. ISTD is challenged by targets occupying less than 0.15% of the image and low distinguishability from complex backgrounds. Existing deep learning methods often suffer from information loss during downsampling and inefficient global context modeling. This paper presents… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Information Fusion 2025

  21. arXiv:2505.23134  [pdf, ps, other

    cs.CV cs.AI

    Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing

    Authors: Tongtong Su, Chengyu Wang, Jun Huang, Dongming Lu

    Abstract: Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named {Zero-to-Hero}, which focuses on reference-based video editing that disentangles the edi… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  22. arXiv:2505.22973  [pdf, ps, other

    cs.LG cs.AI

    EquiReg: Equivariance Regularized Diffusion for Inverse Problems

    Authors: Bahareh Tolooshams, Aditi Chandrashekar, Rayhan Zirvi, Abbas Mammadov, Jiachen Yao, Chuwei Wang, Anima Anandkumar

    Abstract: Diffusion models represent the state-of-the-art for solving inverse problems such as image restoration tasks. In the Bayesian framework, diffusion-based inverse solvers incorporate a likelihood term to guide the prior sampling process, generating data consistent with the posterior distribution. However, due to the intractability of the likelihood term, many current methods rely on isotropic Gaussi… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  23. arXiv:2505.22856  [pdf

    cond-mat.mtrl-sci

    Nanoscale quantum imaging of field-free deterministic switching of a chiral antiferromagnet

    Authors: Jingcheng Zhou, Senlei Li, Chuangtang Wang, Hanshang Jin, Stelo Xu, Zelong Xiong, Carson Jacobsen, Kenji Watanabe, Takashi Taniguchi, Valentin Taufour, Liuyan Zhao, Hua Chen, Chunhui Rita Du, Hailong Wang

    Abstract: Recently, unconventional spin-orbit torques (SOTs) with tunable spin generation open new pathways for designing novel magnetization control for cutting-edge spintronics innovations. A leading research thrust is to develop field-free deterministic magnetization switching for implementing scalable and energy favorable magnetic recording and storage applications, which have been demonstrated in conve… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  24. arXiv:2505.22521  [pdf

    cs.LG cs.AI

    Evaluating Supervised Learning Models for Fraud Detection: A Comparative Study of Classical and Deep Architectures on Imbalanced Transaction Data

    Authors: Chao Wang, Chuanhao Nie, Yunbo Liu

    Abstract: Fraud detection remains a critical task in high-stakes domains such as finance and e-commerce, where undetected fraudulent transactions can lead to significant economic losses. In this study, we systematically compare the performance of four supervised learning models - Logistic Regression, Random Forest, Light Gradient Boosting Machine (LightGBM), and a Gated Recurrent Unit (GRU) network - on a l… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 5 pages. Chao Wang, Chuanhao Nie, and Yunbo Liu contributed equally to this work. Corresponding author: Yunbo Liu ([email protected]). Submitted to the 3rd International Conference on Management Innovation and Economy Development (MIED 2025), Chongqing, China

  25. arXiv:2505.22453  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

    Authors: Lai Wei, Yuting Li, Chen Wang, Yue Wang, Linghe Kong, Weiran Huang, Lichao Sun

    Abstract: Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL). However, these supervised methods require expensive and manually annotated multi-modal data--an ultimately unsustainable resource. While recent efforts have explored unsupervised post-training, their methods are complex and difficult to ite… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  26. arXiv:2505.22438  [pdf, ps, other

    cs.IT cs.AI cs.CV cs.LG eess.IV

    Synonymous Variational Inference for Perceptual Image Compression

    Authors: Zijian Liang, Kai Niu, Changshuo Wang, Jin Xu, Ping Zhang

    Abstract: Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criteri… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 31 pages, 20 figures. This paper is accepted by Proceedings of the 42nd International Conference on Machine Learning (ICML 2025) Poster

  27. arXiv:2505.22334  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

    Authors: Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang

    Abstract: Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While "aha moment" patterns--where models exhibit self-correction through reflection--are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  28. arXiv:2505.22312  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Skywork Open Reasoner 1 Technical Report

    Authors: Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, Xiaoyu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu, Yahui Zhou

    Abstract: The success of DeepSeek-R1 underscores the significant role of reinforcement learning (RL) in enhancing the reasoning capabilities of large language models (LLMs). In this work, we present Skywork-OR1, an effective and scalable RL implementation for long Chain-of-Thought (CoT) models. Building on the DeepSeek-R1-Distill model series, our RL approach achieves notable performance gains, increasing a… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  29. arXiv:2505.22140  [pdf, other

    hep-ex

    Search for a dark baryon in the $Ξ^-\rightarrowπ^-+{\rm invisible}$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (697 additional authors not shown)

    Abstract: A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures, 1 table

  30. arXiv:2505.22063  [pdf, ps, other

    cs.SD eess.AS

    Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR

    Authors: Mingchen Shao, Xinfa Zhu, Chengyou Wang, Bingshen Mu, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Despite remarkable achievements, automatic speech recognition (ASR) in low-resource scenarios still faces two challenges: high-quality data scarcity and high computational demands. This paper proposes EThai-ASR, the first to apply large language models (LLMs) to Thai ASR and create an efficient LLM-based ASR system. EThai-ASR comprises a speech encoder, a connection module and a Thai LLM decoder.… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH 2025

  31. arXiv:2505.21974  [pdf, ps, other

    cs.LG

    BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

    Authors: Yu-Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, Ping-Chun Hsieh

    Abstract: Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optim… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: ICLR 2025. Project page and code at https://hungyuheng.github.io/BOFormer/

  32. arXiv:2505.21919  [pdf, ps, other

    cs.ET cs.AI cs.DC

    Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference

    Authors: Yue Zhu, Hao Yu, Chen Wang, Zhuoran Liu, Eun Kyung Lee

    Abstract: The increasing adoption of large language models (LLMs) with extended context windows necessitates efficient Key-Value Cache (KVC) management to optimize inference performance. Inference workloads like Retrieval-Augmented Generation (RAG) and agents exhibit high cache reusability, making efficient caching critical to reducing redundancy and improving speed. We analyze real-world KVC access pattern… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted at IEEE Cloud 2025 as WIP paper. The final version will appear in IEEE Xplore

  33. arXiv:2505.21902  [pdf, other

    physics.ins-det

    Rise Time and Charge Collection Efficiency of Graphene-Optimized 4H-SiC PIN Detector

    Authors: Zhenyu Jiang, Xuemei Lu, Congcong Wang, Yingjie Huang, Xiaoshen Kang, Suyu Xiao, Xiyuan Zhang, Xin Shi

    Abstract: Silicon carbide detectors exhibit good detection performance and are being considered for detection applications. However, the presence of surface electrode of detector limits the application of low-penetration particle detectors, photodetectors and heavy-ion detection. A graphene-optimized 4H-SiC detector has been fabricated to expand the application of SiC detectors.Its electrical properties and… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  34. arXiv:2505.21502  [pdf, ps, other

    cs.CV

    Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis

    Authors: Yipengjing Sun, Chenyang Wang, Shunyuan Zheng, Zonglin Li, Shengping Zhang, Xiangyang Ji

    Abstract: We propose GRGS, a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions. Unlike existing methods that rely on per-character optimization or ignore physical constraints, GRGS adopts a feed-forward, fully supervised strategy that projects geometry, material, and illumination cues from multi-view 2D observations into 3D Gau… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project Webpage: https://sypj-98.github.io/grgs/

  35. arXiv:2505.21231  [pdf, other

    cs.CV

    Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning

    Authors: Lintao Xu, Yinghao Wang, Chaohui Wang

    Abstract: Occlusion Boundary Estimation (OBE) identifies boundaries arising from both inter-object occlusions and self-occlusion within individual objects, distinguishing intrinsic object edges from occlusion-induced contours to improve scene understanding and 3D reconstruction capacity. This is closely related to Monocular Depth Estimation (MDE), which infers depth from a single image, as occlusion boundar… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 7 pages, 4 tables, 4 figures

  36. arXiv:2505.21216  [pdf, ps, other

    eess.SP

    CiUAV: A Multi-Task 3D Indoor Localization System for UAVs based on Channel State Information

    Authors: Cunyi Yin, Chenwei Wang, Jing Chen, Hao Jiang, Xiren Miao, Shaocong Zheng Zhenghua Chen Senior, Hong Yan

    Abstract: Accurate indoor positioning for unmanned aerial vehicles (UAVs) is critical for logistics, surveillance, and emergency response applications, particularly in GPS-denied environments. Existing indoor localization methods, including optical tracking, ultra-wideband, and Bluetooth-based systems, face cost, accuracy, and robustness trade-offs, limiting their practicality for UAV navigation. This paper… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  37. arXiv:2505.20967  [pdf, ps, other

    cs.CV

    RF4D:Neural Radar Fields for Novel View Synthesis in Outdoor Dynamic Scenes

    Authors: Jiarui Zhang, Zhihao Li, Chong Wang, Bihan Wen

    Abstract: Neural fields (NFs) have demonstrated remarkable performance in scene reconstruction, powering various tasks such as novel view synthesis. However, existing NF methods relying on RGB or LiDAR inputs often exhibit severe fragility to adverse weather, particularly when applied in outdoor scenarios like autonomous driving. In contrast, millimeter-wave radar is inherently robust to environmental chang… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  38. arXiv:2505.20888  [pdf, ps, other

    cs.CL cs.AI

    EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models

    Authors: Chengyu Wang, Junbing Yan, Wenrui Cai, Yuanhao Yue, Jun Huang

    Abstract: In this paper, we present EasyDistill, a comprehensive toolkit designed for effective black-box and white-box knowledge distillation (KD) of large language models (LLMs). Our framework offers versatile functionalities, including data synthesis, supervised fine-tuning, ranking optimization, and reinforcement learning techniques specifically tailored for KD scenarios. The toolkit accommodates KD fun… ▽ More

    Submitted 27 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  39. arXiv:2505.20771  [pdf, ps, other

    cs.IR cs.AI

    Bridging the Gap: Self-Optimized Fine-Tuning for LLM-based Recommender Systems

    Authors: Heng Tang, Feng Liu, Xinbo Chen, Jiawei Chen, Bohao Wang, Changwang Zhang, Jun Wang, Yuegang Sun, Bingde Hu, Can Wang

    Abstract: Recent years have witnessed extensive exploration of Large Language Models (LLMs) on the field of Recommender Systems (RS). There are currently two commonly used strategies to enable LLMs to have recommendation capabilities: 1) The "Guidance-Only" strategy uses in-context learning to exploit and amplify the inherent semantic understanding and item recommendation capabilities of LLMs; 2) The "Tunin… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  40. arXiv:2505.20718  [pdf, ps, other

    cs.CV cs.AI

    VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models

    Authors: Kui Wu, Shuhang Xu, Hao Chen, Churan Wang, Zhoujun Li, Yizhou Wang, Fangwei Zhong

    Abstract: We introduce a novel self-improving framework that enhances Embodied Visual Tracking (EVT) with Vision-Language Models (VLMs) to address the limitations of current active visual tracking systems in recovering from tracking failure. Our approach combines the off-the-shelf active tracking methods with VLMs' reasoning capabilities, deploying a fast visual policy for normal tracking and activating VLM… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  41. arXiv:2505.20710  [pdf, ps, other

    cs.CV

    Hierarchical Instruction-aware Embodied Visual Tracking

    Authors: Kui Wu, Hao Chen, Churan Wang, Fakhri Karray, Zhoujun Li, Yizhou Wang, Fangwei Zhong

    Abstract: User-Centric Embodied Visual Tracking (UC-EVT) presents a novel challenge for reinforcement learning-based models due to the substantial gap between high-level user instructions and low-level agent actions. While recent advancements in language models (e.g., LLMs, VLMs, VLAs) have improved instruction comprehension, these models face critical limitations in either inference speed (LLMs, VLMs) or g… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  42. arXiv:2505.20494  [pdf, ps, other

    physics.geo-ph

    Video-based Direct Time Series Measurement of Along-Strike Slip on the Coseismic Surface Rupture During the 2025 Mw7.7 Myanmar Earthquake

    Authors: Jianhao Gao, Fuhua Zheng, Chaofeng Wang, Haoran Meng

    Abstract: This study presents a time-resolved analysis of coseismic lateral surface rupture along the Sagaing Fault during the Mw 7.7 Mandalay, Myanmar earthquake on March 28, 2025. Leveraging a publicly available Closed-Circuit Television (CCTV) footage alongside on-site measurements, we show the first in-situ high sampling rate direct measurement of a coseismic slip evolution of a fault during an earthqua… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  43. arXiv:2505.20361  [pdf, ps, other

    physics.flu-dyn cs.LG

    Solving Euler equations with Multiple Discontinuities via Separation-Transfer Physics-Informed Neural Networks

    Authors: Chuanxing Wang, Hui Luo, Kai Wang, Guohuai Zhu, Mingxing Luo

    Abstract: Despite the remarkable progress of physics-informed neural networks (PINNs) in scientific computing, they continue to face challenges when solving hydrodynamic problems with multiple discontinuities. In this work, we propose Separation-Transfer Physics Informed Neural Networks (ST-PINNs) to address such problems. By sequentially resolving discontinuities from strong to weak and leveraging transfer… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  44. arXiv:2505.20270  [pdf, ps, other

    cs.CV

    ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation

    Authors: Jinsheng Quan, Chunshi Wang, Yawei Luo

    Abstract: This paper aims to model the dynamics of 3D Gaussians from visual observations to support temporal extrapolation. Existing dynamic 3D reconstruction methods often struggle to effectively learn underlying dynamics or rely heavily on manually defined physical priors, which limits their extrapolation capabilities. To address this issue, we propose a novel dynamic 3D Gaussian Splatting prior-free moti… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  45. arXiv:2505.20022  [pdf, other

    math.ST

    Kernel Ridge Regression with Predicted Feature Inputs and Applications to Factor-Based Nonparametric Regression

    Authors: Xin Bing, Xin He, Chao Wang

    Abstract: Kernel methods, particularly kernel ridge regression (KRR), are time-proven, powerful nonparametric regression techniques known for their rich capacity, analytical simplicity, and computational tractability. The analysis of their predictive performance has received continuous attention for more than two decades. However, in many modern regression problems where the feature inputs used in KRR canno… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  46. arXiv:2505.19919  [pdf, ps, other

    cs.CV

    Weather-Magician: Reconstruction and Rendering Framework for 4D Weather Synthesis In Real Time

    Authors: Chen Sang, Yeqiang Qian, Jiale Zhang, Chunxiang Wang, Ming Yang

    Abstract: For tasks such as urban digital twins, VR/AR/game scene design, or creating synthetic films, the traditional industrial approach often involves manually modeling scenes and using various rendering engines to complete the rendering process. This approach typically requires high labor costs and hardware demands, and can result in poor quality when replicating complex real-world scenes. A more effici… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Project homepage: https://weathermagician.github.io

  47. arXiv:2505.19907  [pdf, ps, other

    hep-ex nucl-ex

    First measurement of $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ cross-sections via $Σ^+$-nucleus scattering at an electron-positron collider

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

    Abstract: Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals o… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 2 figures

  48. arXiv:2505.19797  [pdf, ps, other

    cs.CL

    The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

    Authors: Yiqun Zhang, Hao Li, Chenxu Wang, Linyao Chen, Qiaosheng Zhang, Peng Ye, Shi Feng, Daling Wang, Zhen Wang, Xinrun Wang, Jia Xu, Lei Bai, Wanli Ouyang, Shuyue Hu

    Abstract: Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedd… ▽ More

    Submitted 18 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures, 6 tables, supplementary material (appendix) included separately

  49. arXiv:2505.19640  [pdf, other

    cs.CL

    Interleaved Reasoning for Large Language Models via Reinforcement Learning

    Authors: Roy Xie, David Qiu, Deepak Gopinath, Dong Lin, Yanchao Sun, Chong Wang, Saloni Potdar, Bhuwan Dhingra

    Abstract: Long chain-of-thought (CoT) significantly enhances large language models' (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inhe… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  50. arXiv:2505.19595  [pdf, ps, other

    eess.AS cs.SD

    Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment

    Authors: Jeongsoo Choi, Zhikang Niu, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen

    Abstract: The goal of this paper is to optimize the training process of diffusion-based text-to-speech models. While recent studies have achieved remarkable advancements, their training demands substantial time and computational costs, largely due to the implicit guidance of diffusion models in learning complex intermediate representations. To address this, we propose A-DMA, an effective strategy for Accele… ▽ More

    Submitted 30 May, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Interspeech 2025