-
Joint User Association and Beamforming Design for ISAC Networks with Large Language Models
Authors:
Haoyun Li,
Ming Xiao,
Kezhi Wang,
Robert Schober,
Dong In Kim,
Yong Liang Guan
Abstract:
Integrated sensing and communication (ISAC) has been envisioned to play a more important role in future wireless networks. However, the design of ISAC networks is challenging, especially when there are multiple communication and sensing (C\&S) nodes and multiple sensing targets. We investigate a multi-base station (BS) ISAC network in which multiple BSs equipped with multiple antennas simultaneous…
▽ More
Integrated sensing and communication (ISAC) has been envisioned to play a more important role in future wireless networks. However, the design of ISAC networks is challenging, especially when there are multiple communication and sensing (C\&S) nodes and multiple sensing targets. We investigate a multi-base station (BS) ISAC network in which multiple BSs equipped with multiple antennas simultaneously provide C\&S services for multiple ground communication users (CUs) and targets. To enhance the overall performance of C\&S, we formulate a joint user association (UA) and multi-BS transmit beamforming optimization problem with the objective of maximizing the total sum rate of all CUs while ensuring both the minimum target detection and parameter estimation requirements. To efficiently solve the highly non-convex mixed integer nonlinear programming (MINLP) optimization problem, we propose an alternating optimization (AO)-based algorithm that decomposes the problem into two sub-problems, i.e., UA optimization and multi-BS transmit beamforming optimization. Inspired by large language models (LLMs) for prediction and inference, we propose a unified framework integrating LLMs with convex-based optimization methods. First, we propose a comprehensive design of prompt engineering, including few-shot, chain of thought, and self-reflection techniques to guide LLMs in solving the binary integer programming UA optimization problem. Second, we utilize convex-based optimization methods to handle the non-convex beamforming optimization problem based on fractional programming (FP), majorization minimization (MM), and the alternating direction method of multipliers (ADMM) with an optimized UA from LLMs. Numerical results demonstrate that our proposed LLM-enabled AO-based algorithm achieves fast convergence and near upper-bound performance with the GPT-o1 model, outperforming various benchmark schemes.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning
Authors:
Zikui Cai,
Andrew Wang,
Anirudh Satheesh,
Ankit Nakhawa,
Hyunwoo Jae,
Keenan Powell,
Minghui Liu,
Neel Jay,
Sungbin Oh,
Xiyao Wang,
Yongyuan Liang,
Tom Goldstein,
Furong Huang
Abstract:
Despite rapid advances in vision-language models (VLMs), current benchmarks for multimodal reasoning fall short in three key dimensions. First, they overwhelmingly rely on static images, failing to capture the temporal complexity of real-world environments. Second, they narrowly focus on mathematical problem-solving, neglecting the broader spectrum of reasoning skills -- including abstract, physic…
▽ More
Despite rapid advances in vision-language models (VLMs), current benchmarks for multimodal reasoning fall short in three key dimensions. First, they overwhelmingly rely on static images, failing to capture the temporal complexity of real-world environments. Second, they narrowly focus on mathematical problem-solving, neglecting the broader spectrum of reasoning skills -- including abstract, physical, planning, spatial, and temporal capabilities -- required for robust multimodal intelligence. Third, many benchmarks quickly saturate, offering limited headroom for diagnosing failure modes or measuring continued progress. We introduce MORSE-500 (Multimodal Reasoning Stress-test Environment), a video benchmark composed of 500 fully scripted clips with embedded questions spanning six complementary reasoning categories. Each instance is programmatically generated using deterministic Python scripts (via Manim, Matplotlib, MoviePy), generative video models, and curated real footage. This script-driven design allows fine-grained control over visual complexity, distractor density, and temporal dynamics -- enabling difficulty to be scaled systematically as models improve. Unlike static benchmarks that become obsolete once saturated, MORSE-500 is built to evolve: its controllable generation pipeline supports the creation of arbitrarily challenging new instances, making it ideally suited for stress-testing next-generation models. Initial experiments with state-of-the-art systems -- including various Gemini 2.5 Pro and OpenAI o3 which represent the strongest available at the time, alongside strong open-source models -- reveal substantial performance gaps across all categories, with particularly large deficits in abstract and planning tasks. We release the full dataset, generation scripts, and evaluation harness to support transparent, reproducible, and forward-looking multimodal reasoning research.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Challenging Spontaneous Quantum Collapse with XENONnT
Authors:
E. Aprile,
J. Aalbers,
K. Abe,
S. Ahmed Maouloud,
L. Althueser,
B. Andrieu,
E. Angelino,
D. Antón Martin,
S. R. Armbruster,
F. Arneodo,
L. Baudis,
M. Bazyk,
L. Bellagamba,
R. Biondi,
A. Bismark,
K. Boese,
A. Brown,
G. Bruno,
R. Budnik,
C. Cai,
C. Capelli,
J. M. R. Cardoso,
A. P. Cimental Chávez,
A. P. Colijn,
J. Conrad
, et al. (152 additional authors not shown)
Abstract:
We report on the search for X-ray radiation as predicted from dynamical quantum collapse with low-energy electronic recoil data in the energy range of 1-140 keV from the first science run of the XENONnT dark matter detector. Spontaneous radiation is an unavoidable effect of dynamical collapse models, which were introduced as a possible solution to the long-standing measurement problem in quantum m…
▽ More
We report on the search for X-ray radiation as predicted from dynamical quantum collapse with low-energy electronic recoil data in the energy range of 1-140 keV from the first science run of the XENONnT dark matter detector. Spontaneous radiation is an unavoidable effect of dynamical collapse models, which were introduced as a possible solution to the long-standing measurement problem in quantum mechanics. The analysis utilizes a model that for the first time accounts for cancellation effects in the emitted spectrum, which arise in the X-ray range due to the opposing electron-proton charges in xenon atoms. New world-leading limits on the free parameters of the Markovian continuous spontaneous localization and Diósi-Penrose models are set, improving previous best constraints by two orders of magnitude and a factor of five, respectively. The original values proposed for the strength and the correlation length of the continuous spontaneous localization model are excluded experimentally for the first time.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Zeroth-Order Optimization Finds Flat Minima
Authors:
Liang Zhang,
Bingcong Li,
Kiran Koshy Thekumparampil,
Sewoong Oh,
Michael Muehlebach,
Niao He
Abstract:
Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit regularization that provides a fine-grained characterization on wh…
▽ More
Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known on the implicit regularization that provides a fine-grained characterization on which particular solutions are finally reached. We show that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian, which is widely used in previous work to distinguish between sharp and flat minima. We further provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions, where flat minima are defined as the minimizers that achieve the smallest trace of Hessian among all optimal solutions. Experiments on binary classification tasks with convex losses and language model fine-tuning support our theoretical findings.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Spin textures in curved paths on a curved surface
Authors:
Guo-Hua Liang,
Ai-Guo Mei,
Zhi-Hui Yang,
Ze-Lin Wei
Abstract:
This study investigates the quantum dynamics of a spin-1/2 particle confined to a curved path from the dynamics of a two-dimensional curved thin-layer system incorporating spin connection contributions. We demonstrate that the geodesic curvature, normal curvature, and geodesic torsion of the curve govern the emergent non-Abelian gauge potential and effective scalar potential in the system's Hamilt…
▽ More
This study investigates the quantum dynamics of a spin-1/2 particle confined to a curved path from the dynamics of a two-dimensional curved thin-layer system incorporating spin connection contributions. We demonstrate that the geodesic curvature, normal curvature, and geodesic torsion of the curve govern the emergent non-Abelian gauge potential and effective scalar potential in the system's Hamiltonian. The resulting spin precession dynamics induced by the gauge potential are analyzed, revealing that the rotation angle of spin orientation along a surface boundary and the pseudo-magnetic flux are topologically governed by the surface geometry. Spin texture evolution along helices illustrates distinct behaviors under geodesic versus non-geodesic propagation. Further, spin evolution in a closed trajectory-Viviani's curve, exemplifies the surface-dependent spin orientation and path-ordering sensitivity of the non-Abelian gauge potential. Our theory establishes a framework for spin-state manipulation via engineered nanostructured channels, enabling novel topological quantum control strategies.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Automatically Detecting Amusing Games in Wordle
Authors:
Ronaldo Luo,
Gary Liang,
Cindy Liu,
Adam Kabbara,
Minahil Bakhtawar,
Kina Kim,
Michael Guerzhoy
Abstract:
We explore automatically predicting which Wordle games Reddit users find amusing.
We scrape approximately 80k reactions by Reddit users to Wordle games from Reddit, classify the reactions as expressing amusement or not using OpenAI's GPT-3.5 using few-shot prompting, and verify that GPT-3.5's labels roughly correspond to human labels.
We then extract features from Wordle games that can predict…
▽ More
We explore automatically predicting which Wordle games Reddit users find amusing.
We scrape approximately 80k reactions by Reddit users to Wordle games from Reddit, classify the reactions as expressing amusement or not using OpenAI's GPT-3.5 using few-shot prompting, and verify that GPT-3.5's labels roughly correspond to human labels.
We then extract features from Wordle games that can predict user amusement. We demonstrate that the features indeed provide a (weak) signal that predicts user amusement as predicted by GPT-3.5.
Our results indicate that user amusement at Wordle games can be predicted computationally to some extent. We explore which features of the game contribute to user amusement.
We find that user amusement is predictable, indicating a measurable aspect of creativity infused into Wordle games through humor.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Robust Anti-Backdoor Instruction Tuning in LVLMs
Authors:
Yuan Xun,
Siyuan Liang,
Xiaojun Jia,
Xinwei Liu,
Xiaochun Cao
Abstract:
Large visual language models (LVLMs) have demonstrated excellent instruction-following capabilities, yet remain vulnerable to stealthy backdoor attacks when finetuned using contaminated data. Existing backdoor defense techniques are usually developed for single-modal visual or language models under fully parameter-adjustable settings or rely on supervisory knowledge during training. However, in re…
▽ More
Large visual language models (LVLMs) have demonstrated excellent instruction-following capabilities, yet remain vulnerable to stealthy backdoor attacks when finetuned using contaminated data. Existing backdoor defense techniques are usually developed for single-modal visual or language models under fully parameter-adjustable settings or rely on supervisory knowledge during training. However, in real-world scenarios, defenders cannot modify frozen visual encoders or core LLM parameters, nor possess prior knowledge of unknown trigger patterns or target responses. Motivated by the empirical finding that LVLMs readily overfit to fixed, unknown triggers, which can embed malicious associations during adapter-level tuning, we aim to design a defense that operates without access to core weights or attack priors. To this end, we introduce a lightweight, certified-agnostic defense framework, Robust Instruction Tuning, that finetunes only adapter modules and text embedding layers under instruction tuning. Our method integrates two complementary regularizations: (1) Input Diversity Regularization, which perturbs trigger components across training samples to disrupt consistent spurious cues; and (2) Anomalous Activation Regularization, which dynamically sparses adapter weights exhibiting abnormally sharp activations linked to backdoor patterns. These mechanisms jointly guide the model toward learning semantically grounded representations rather than memorizing superficial trigger-response mappings.
Extensive experiments against seven attacks on Flickr30k and MSCOCO demonstrate that ours
reduces their attack success rate to nearly zero, with an increase in training cost of less than 15%.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Heterogeneous Secure Transmissions in IRS-Assisted NOMA Communications: CO-GNN Approach
Authors:
Linlin Liang,
Zongkai Tian,
Haiyan Huang,
Xiaoyan Li,
Zhisheng Yin,
Dehua Zhang,
Nina Zhang,
Wenchao Zhai
Abstract:
Intelligent Reflecting Surfaces (IRS) enhance spectral efficiency by adjusting reflection phase shifts, while Non-Orthogonal Multiple Access (NOMA) increases system capacity. Consequently, IRS-assisted NOMA communications have garnered significant research interest. However, the passive nature of the IRS, lacking authentication and security protocols, makes these systems vulnerable to external eav…
▽ More
Intelligent Reflecting Surfaces (IRS) enhance spectral efficiency by adjusting reflection phase shifts, while Non-Orthogonal Multiple Access (NOMA) increases system capacity. Consequently, IRS-assisted NOMA communications have garnered significant research interest. However, the passive nature of the IRS, lacking authentication and security protocols, makes these systems vulnerable to external eavesdropping due to the openness of electromagnetic signal propagation and reflection. NOMA's inherent multi-user signal superposition also introduces internal eavesdropping risks during user pairing. This paper investigates secure transmissions in IRS-assisted NOMA systems with heterogeneous resource configuration in wireless networks to mitigate both external and internal eavesdropping. To maximize the sum secrecy rate of legitimate users, we propose a combinatorial optimization graph neural network (CO-GNN) approach to jointly optimize beamforming at the base station, power allocation of NOMA users, and phase shifts of IRS for dynamic heterogeneous resource allocation, thereby enabling the design of dual-link or multi-link secure transmissions in the presence of eavesdroppers on the same or heterogeneous links. The CO-GNN algorithm simplifies the complex mathematical problem-solving process, eliminates the need for channel estimation, and enhances scalability. Simulation results demonstrate that the proposed algorithm significantly enhances the secure transmission performance of the system.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs
Authors:
Haoyuan Li,
Yanpeng Zhou,
Yufei Gao,
Tao Tang,
Jianhua Han,
Yujie Yuan,
Dave Zhenyu Chen,
Jiawang Bian,
Hang Xu,
Xiaodan Liang
Abstract:
Remarkable progress in 2D Vision-Language Models (VLMs) has spurred interest in extending them to 3D settings for tasks like 3D Question Answering, Dense Captioning, and Visual Grounding. Unlike 2D VLMs that typically process images through an image encoder, 3D scenes, with their intricate spatial structures, allow for diverse model architectures. Based on their encoder design, this paper categori…
▽ More
Remarkable progress in 2D Vision-Language Models (VLMs) has spurred interest in extending them to 3D settings for tasks like 3D Question Answering, Dense Captioning, and Visual Grounding. Unlike 2D VLMs that typically process images through an image encoder, 3D scenes, with their intricate spatial structures, allow for diverse model architectures. Based on their encoder design, this paper categorizes recent 3D VLMs into 3D object-centric, 2D image-based, and 3D scene-centric approaches. Despite the architectural similarity of 3D scene-centric VLMs to their 2D counterparts, they have exhibited comparatively lower performance compared with the latest 3D object-centric and 2D image-based approaches. To understand this gap, we conduct an in-depth analysis, revealing that 3D scene-centric VLMs show limited reliance on the 3D scene encoder, and the pre-train stage appears less effective than in 2D VLMs. Furthermore, we observe that data scaling benefits are less pronounced on larger datasets. Our investigation suggests that while these models possess cross-modal alignment capabilities, they tend to over-rely on linguistic cues and overfit to frequent answer distributions, thereby diminishing the effective utilization of the 3D encoder. To address these limitations and encourage genuine 3D scene understanding, we introduce a novel 3D Relevance Discrimination QA dataset designed to disrupt shortcut learning and improve 3D understanding. Our findings highlight the need for advanced evaluation and improved strategies for better 3D understanding in 3D VLMs.
△ Less
Submitted 6 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
TreeRPO: Tree Relative Policy Optimization
Authors:
Zhicheng Yang,
Zhijiang Guo,
Yinya Huang,
Xiaodan Liang,
Yiwei Wang,
Jing Tang
Abstract:
Large Language Models (LLMs) have shown remarkable reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR) methods. However, a key limitation of existing approaches is that rewards defined at the full trajectory level provide insufficient guidance for optimizing the intermediate steps of a reasoning process. To address this, we introduce \textbf{\name}, a novel method…
▽ More
Large Language Models (LLMs) have shown remarkable reasoning capabilities through Reinforcement Learning with Verifiable Rewards (RLVR) methods. However, a key limitation of existing approaches is that rewards defined at the full trajectory level provide insufficient guidance for optimizing the intermediate steps of a reasoning process. To address this, we introduce \textbf{\name}, a novel method that estimates the mathematical expectations of rewards at various reasoning steps using tree sampling. Unlike prior methods that rely on a separate step reward model, \name directly estimates these rewards through this sampling process. Building on the group-relative reward training mechanism of GRPO, \name innovatively computes rewards based on step-level groups generated during tree sampling. This advancement allows \name to produce fine-grained and dense reward signals, significantly enhancing the learning process and overall performance of LLMs. Experimental results demonstrate that our \name algorithm substantially improves the average Pass@1 accuracy of Qwen-2.5-Math on test benchmarks, increasing it from 19.0\% to 35.5\%. Furthermore, \name significantly outperforms GRPO by 2.9\% in performance while simultaneously reducing the average response length by 18.1\%, showcasing its effectiveness and efficiency. Our code will be available at \href{https://github.com/yangzhch6/TreeRPO}{https://github.com/yangzhch6/TreeRPO}.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Whole-Body Constrained Learning for Legged Locomotion via Hierarchical Optimization
Authors:
Haoyu Wang,
Ruyi Zhou,
Liang Ding,
Tie Liu,
Zhelin Zhang,
Peng Xu,
Haibo Gao,
Zongquan Deng
Abstract:
Reinforcement learning (RL) has demonstrated impressive performance in legged locomotion over various challenging environments. However, due to the sim-to-real gap and lack of explainability, unconstrained RL policies deployed in the real world still suffer from inevitable safety issues, such as joint collisions, excessive torque, or foot slippage in low-friction environments. These problems limit…
▽ More
Reinforcement learning (RL) has demonstrated impressive performance in legged locomotion over various challenging environments. However, due to the sim-to-real gap and lack of explainability, unconstrained RL policies deployed in the real world still suffer from inevitable safety issues, such as joint collisions, excessive torque, or foot slippage in low-friction environments. These problems limit its usage in missions with strict safety requirements, such as planetary exploration, nuclear facility inspection, and deep-sea operations. In this paper, we design a hierarchical optimization-based whole-body follower, which integrates both hard and soft constraints into RL framework to make the robot move with better safety guarantees. Leveraging the advantages of model-based control, our approach allows for the definition of various types of hard and soft constraints during training or deployment, which allows for policy fine-tuning and mitigates the challenges of sim-to-real transfer. Meanwhile, it preserves the robustness of RL when dealing with locomotion in complex unstructured environments. The trained policy with introduced constraints was deployed in a hexapod robot and tested in various outdoor environments, including snow-covered slopes and stairs, demonstrating the great traversability and safety of our approach.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Study of $f_1(1420)$ and $η(1405)$ in the decay $J/ψ\to γπ^{0}π^{0}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (650 additional authors not shown)
Abstract:
A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$,…
▽ More
A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$, $f_1(1420)$ and $f_1(1510)$, are observed to decay into $π^{0}π^{0}π^{0}$. The product branching fractions of these resonances are reported.
△ Less
Submitted 7 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Rethinking Contrastive Learning in Session-based Recommendation
Authors:
Xiaokun Zhang,
Bo Xu,
Fenglong Ma,
Zhizheng Wang,
Liang Yang,
Hongfei Lin
Abstract:
Session-based recommendation aims to predict intents of anonymous users based on limited behaviors. With the ability in alleviating data sparsity, contrastive learning is prevailing in the task. However, we spot that existing contrastive learning based methods still suffer from three obstacles: (1) they overlook item-level sparsity and primarily focus on session-level sparsity; (2) they typically…
▽ More
Session-based recommendation aims to predict intents of anonymous users based on limited behaviors. With the ability in alleviating data sparsity, contrastive learning is prevailing in the task. However, we spot that existing contrastive learning based methods still suffer from three obstacles: (1) they overlook item-level sparsity and primarily focus on session-level sparsity; (2) they typically augment sessions using item IDs like crop, mask and reorder, failing to ensure the semantic consistency of augmented views; (3) they treat all positive-negative signals equally, without considering their varying utility. To this end, we propose a novel multi-modal adaptive contrastive learning framework called MACL for session-based recommendation. In MACL, a multi-modal augmentation is devised to generate semantically consistent views at both item and session levels by leveraging item multi-modal features. Besides, we present an adaptive contrastive loss that distinguishes varying contributions of positive-negative signals to improve self-supervised learning. Extensive experiments on three real-world datasets demonstrate the superiority of MACL over state-of-the-art methods.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis
Authors:
Wenyan Xu,
Dawei Xiang,
Yue Liu,
Xiyu Wang,
Yanxiang Ma,
Liang Zhang,
Chang Xu,
Jiaheng Zhang
Abstract:
Pure time series forecasting tasks typically focus exclusively on numerical features; however, real-world financial decision-making demands the comparison and analysis of heterogeneous sources of information. Recent advances in deep learning and large scale language models (LLMs) have made significant strides in capturing sentiment and other qualitative signals, thereby enhancing the accuracy of f…
▽ More
Pure time series forecasting tasks typically focus exclusively on numerical features; however, real-world financial decision-making demands the comparison and analysis of heterogeneous sources of information. Recent advances in deep learning and large scale language models (LLMs) have made significant strides in capturing sentiment and other qualitative signals, thereby enhancing the accuracy of financial time series predictions. Despite these advances, most existing datasets consist solely of price series and news text, are confined to a single market, and remain limited in scale. In this paper, we introduce FinMultiTime, the first large scale, multimodal financial time series dataset. FinMultiTime temporally aligns four distinct modalities financial news, structured financial tables, K-line technical charts, and stock price time series across both the S&P 500 and HS 300 universes. Covering 5,105 stocks from 2009 to 2025 in the United States and China, the dataset totals 112.6 GB and provides minute-level, daily, and quarterly resolutions, thus capturing short, medium, and long term market signals with high fidelity. Our experiments demonstrate that (1) scale and data quality markedly boost prediction accuracy; (2) multimodal fusion yields moderate gains in Transformer models; and (3) a fully reproducible pipeline enables seamless dataset updates.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
QiMeng: Fully Automated Hardware and Software Design for Processor Chip
Authors:
Rui Zhang,
Yuanbo Wen,
Shuyao Cheng,
Di Huang,
Shaohui Peng,
Jiaming Guo,
Pengwei Jin,
Jiacheng Zhao,
Tianrui Ma,
Yaoyu Zhu,
Yifan Hao,
Yongwei Zhao,
Shengwen Liang,
Ying Wang,
Xing Hu,
Zidong Du,
Huimin Cui,
Ling Li,
Qi Guo,
Yunji Chen
Abstract:
Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip…
▽ More
Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three major challenges: the physical constraints of fabrication technologies, the escalating demands for design resources, and the increasing diversity of ecosystems. Automated processor chip design has emerged as a transformative solution to address these challenges. While recent breakthroughs in Artificial Intelligence (AI), particularly Large Language Models (LLMs) techniques, have opened new possibilities for fully automated processor chip design, substantial challenges remain in establishing domain-specific LLMs for processor chip design.
In this paper, we propose QiMeng, a novel system for fully automated hardware and software design of processor chips. QiMeng comprises three hierarchical layers. In the bottom-layer, we construct a domain-specific Large Processor Chip Model (LPCM) that introduces novel designs in architecture, training, and inference, to address key challenges such as knowledge representation gap, data scarcity, correctness assurance, and enormous solution space. In the middle-layer, leveraging the LPCM's knowledge representation and inference capabilities, we develop the Hardware Design Agent and the Software Design Agent to automate the design of hardware and software for processor chips. Currently, several components of QiMeng have been completed and successfully applied in various top-layer applications, demonstrating significant advantages and providing a feasible solution for efficient, fully automated hardware/software design of processor chips. Future research will focus on integrating all components and performing iterative top-down and bottom-up design processes to establish a comprehensive QiMeng system.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
Authors:
Yongjie Xiao,
Hongru Liang,
Peixin Qin,
Yao Zhang,
Wenqiang Lei
Abstract:
Despite the great potential of large language models(LLMs) in machine comprehension, it is still disturbing to fully count on them in real-world scenarios. This is probably because there is no rational explanation for whether the comprehension process of LLMs is aligned with that of experts. In this paper, we propose SCOP to carefully examine how LLMs perform during the comprehension process from…
▽ More
Despite the great potential of large language models(LLMs) in machine comprehension, it is still disturbing to fully count on them in real-world scenarios. This is probably because there is no rational explanation for whether the comprehension process of LLMs is aligned with that of experts. In this paper, we propose SCOP to carefully examine how LLMs perform during the comprehension process from a cognitive view. Specifically, it is equipped with a systematical definition of five requisite skills during the comprehension process, a strict framework to construct testing data for these skills, and a detailed analysis of advanced open-sourced and closed-sourced LLMs using the testing data. With SCOP, we find that it is still challenging for LLMs to perform an expert-level comprehension process. Even so, we notice that LLMs share some similarities with experts, e.g., performing better at comprehending local information than global information. Further analysis reveals that LLMs can be somewhat unreliable -- they might reach correct answers through flawed comprehension processes. Based on SCOP, we suggest that one direction for improving LLMs is to focus more on the comprehension process, ensuring all comprehension skills are thoroughly developed during training.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Predicting ICU In-Hospital Mortality Using Adaptive Transformer Layer Fusion
Authors:
Han Wang,
Ruoyun He,
Guoguang Lao,
Ting Liu,
Hejiao Luo,
Changqi Qin,
Hongying Luo,
Junmin Huang,
Zihan Wei,
Lu Chen,
Yongzhi Xu,
Ziqian Bi,
Junhao Song,
Tianyang Wang,
Chia Xin Liang,
Xinyuan Song,
Huafeng Liu,
Junfeng Hao,
Chunjie Tian
Abstract:
Early identification of high-risk ICU patients is crucial for directing limited medical resources. We introduce ALFIA (Adaptive Layer Fusion with Intelligent Attention), a modular, attention-based architecture that jointly trains LoRA (Low-Rank Adaptation) adapters and an adaptive layer-weighting mechanism to fuse multi-layer semantic features from a BERT backbone. Trained on our rigorous cw-24 (C…
▽ More
Early identification of high-risk ICU patients is crucial for directing limited medical resources. We introduce ALFIA (Adaptive Layer Fusion with Intelligent Attention), a modular, attention-based architecture that jointly trains LoRA (Low-Rank Adaptation) adapters and an adaptive layer-weighting mechanism to fuse multi-layer semantic features from a BERT backbone. Trained on our rigorous cw-24 (CriticalWindow-24) benchmark, ALFIA surpasses state-of-the-art tabular classifiers in AUPRC while preserving a balanced precision-recall profile. The embeddings produced by ALFIA's fusion module, capturing both fine-grained clinical cues and high-level concepts, enable seamless pairing with GBDTs (CatBoost/LightGBM) as ALFIA-boost, and deep neuro networks as ALFIA-nn, yielding additional performance gains. Our experiments confirm ALFIA's superior early-warning performance, by operating directly on routine clinical text, it furnishes clinicians with a convenient yet robust tool for risk stratification and timely intervention in critical-care settings.
△ Less
Submitted 6 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Multivariate Probabilistic Assessment of Speech Quality
Authors:
Fredrik Cumlin,
Xinyu Liang,
Victor Ungureanu,
Chandan K. A. Reddy,
Christian Schüldt,
Saikat Chatterjee
Abstract:
The mean opinion score (MOS) is a standard metric for assessing speech quality, but its singular focus fails to identify specific distortions when low scores are observed. The NISQA dataset addresses this limitation by providing ratings across four additional dimensions: noisiness, coloration, discontinuity, and loudness, alongside MOS. In this paper, we extend the explored univariate MOS estimati…
▽ More
The mean opinion score (MOS) is a standard metric for assessing speech quality, but its singular focus fails to identify specific distortions when low scores are observed. The NISQA dataset addresses this limitation by providing ratings across four additional dimensions: noisiness, coloration, discontinuity, and loudness, alongside MOS. In this paper, we extend the explored univariate MOS estimation to a multivariate framework by modeling these dimensions jointly using a multivariate Gaussian distribution. Our approach utilizes Cholesky decomposition to predict covariances without imposing restrictive assumptions and extends probabilistic affine transformations to a multivariate context. Experimental results show that our model performs on par with state-of-the-art methods in point estimation, while uniquely providing uncertainty and correlation estimates across speech quality dimensions. This enables better diagnosis of poor speech quality and informs targeted improvements.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
Authors:
Zhen Hao Wong,
Jingwen Deng,
Runming He,
Zirong Chen,
Qijie You,
Hejun Dong,
Hao Liang,
Chengyu Shen,
Bin Cui,
Wentao Zhang
Abstract:
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a "play to learn" framework that fine-tunes LLMs through reinforcement learnin…
▽ More
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks, especially for mid-difficulty problems that require multi-step reasoning. Analyses across problem categories and difficulty levels reveal that puzzle training promotes transferable reasoning routines, strengthening algebraic manipulation, geometric inference, and combinatorial logic, while offering limited gains on rote or highly specialized tasks. These findings show that reinforcement learning over logic puzzles reshapes the internal reasoning of LLMs, enabling more robust and compositional generalization without relying on task-specific symbolic tools.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Authors:
Yujun Zhou,
Jiayi Ye,
Zipeng Ling,
Yufei Han,
Yue Huang,
Haomin Zhuang,
Zhenwen Liang,
Kehan Guo,
Taicheng Guo,
Xiangqi Wang,
Xiangliang Zhang
Abstract:
Logical reasoning is a core capability for many applications of large language models (LLMs), yet existing benchmarks often rely solely on final-answer accuracy, failing to capture the quality and structure of the reasoning process. We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall benchmark accuracy, stepwise soundness, and…
▽ More
Logical reasoning is a core capability for many applications of large language models (LLMs), yet existing benchmarks often rely solely on final-answer accuracy, failing to capture the quality and structure of the reasoning process. We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall benchmark accuracy, stepwise soundness, and representation-level alignment. In addition, to better understand how reasoning capabilities emerge, we conduct a comprehensive study on the effects of supervision format during fine-tuning. We construct four supervision styles (one natural language and three symbolic variants) and train LLMs under each. Our findings reveal that natural language supervision yields strong generalization even on out-of-distribution and long-context tasks, while symbolic reasoning styles promote more structurally sound and atomic inference chains. Further, our representation-level probing shows that fine-tuning primarily improves reasoning behaviors through step-by-step generation, rather than enhancing shortcut prediction or internalized correctness. Together, our framework and analysis provide a more rigorous and interpretable lens for evaluating and improving logical reasoning in LLMs.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs
Authors:
Shuhan Xu,
Siyuan Liang,
Hongling Zheng,
Yong Luo,
Aishan Liu,
Dacheng Tao
Abstract:
Vision-Language Models (VLMs) have achieved remarkable performance in image captioning, but recent studies show they are vulnerable to backdoor attacks. Attackers can inject imperceptible perturbations-such as local pixel triggers or global semantic phrases-into the training data, causing the model to generate malicious, attacker-controlled captions for specific inputs. These attacks are hard to d…
▽ More
Vision-Language Models (VLMs) have achieved remarkable performance in image captioning, but recent studies show they are vulnerable to backdoor attacks. Attackers can inject imperceptible perturbations-such as local pixel triggers or global semantic phrases-into the training data, causing the model to generate malicious, attacker-controlled captions for specific inputs. These attacks are hard to detect and defend due to their stealthiness and cross-modal nature. By analyzing attack samples, we identify two key vulnerabilities: (1) abnormal attention concentration on specific image regions, and (2) semantic drift and incoherence in generated captions. To counter this, we propose Semantic Reward Defense (SRD), a reinforcement learning framework that mitigates backdoor behavior without prior knowledge of triggers. SRD uses a Deep Q-Network to learn policies for applying discrete perturbations (e.g., occlusion, color masking) to sensitive image regions, aiming to disrupt the activation of malicious pathways. We design a semantic fidelity score as the reward signal, which jointly evaluates semantic consistency and linguistic fluency of the output, guiding the agent toward generating robust yet faithful captions. Experiments across mainstream VLMs and datasets show SRD reduces attack success rates to 5.6%, while preserving caption quality on clean inputs with less than 10% performance drop. SRD offers a trigger-agnostic, interpretable defense paradigm against stealthy backdoor threats in multimodal generative models.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Variational toolbox-based separability detection of multiqubit states
Authors:
Jin-Min Liang,
Shao-Ming Fei,
Qiongyi He
Abstract:
Parametrized quantum circuits (PQCs) are crucial in variational quantum algorithms. While it is commonly believed that the optimal PQC is solely used to reproduce the target state, we here reveal that the optimal PQC can also provide valuable insights into the state's properties. We propose variational toolboxes to identify the $k$-separability of pure states, with or without preparation noise, by…
▽ More
Parametrized quantum circuits (PQCs) are crucial in variational quantum algorithms. While it is commonly believed that the optimal PQC is solely used to reproduce the target state, we here reveal that the optimal PQC can also provide valuable insights into the state's properties. We propose variational toolboxes to identify the $k$-separability of pure states, with or without preparation noise, by checking the structure within the optimal PQCs. Additionally, we introduce adaptive optimization strategies to detect the $k$-separability of mixed states. Compared to fixed PQCs, our approach controls fewer parameters for low-rank states. Finally, we validate our methods through numerical demonstrations for various states.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
EMO-Debias: Benchmarking Gender Debiasing Techniques in Multi-Label Speech Emotion Recognition
Authors:
Yi-Cheng Lin,
Huang-Cheng Chou,
Yu-Hsuan Li Liang,
Hung-yi Lee
Abstract:
Speech emotion recognition (SER) systems often exhibit gender bias. However, the effectiveness and robustness of existing debiasing methods in such multi-label scenarios remain underexplored. To address this gap, we present EMO-Debias, a large-scale comparison of 13 debiasing methods applied to multi-label SER. Our study encompasses techniques from pre-processing, regularization, adversarial learn…
▽ More
Speech emotion recognition (SER) systems often exhibit gender bias. However, the effectiveness and robustness of existing debiasing methods in such multi-label scenarios remain underexplored. To address this gap, we present EMO-Debias, a large-scale comparison of 13 debiasing methods applied to multi-label SER. Our study encompasses techniques from pre-processing, regularization, adversarial learning, biased learners, and distributionally robust optimization. Experiments conducted on acted and naturalistic emotion datasets, using WavLM and XLSR representations, evaluate each method under conditions of gender imbalance. Our analysis quantifies the trade-offs between fairness and accuracy, identifying which approaches consistently reduce gender performance gaps without compromising overall model performance. The findings provide actionable insights for selecting effective debiasing strategies and highlight the impact of dataset distributions.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Achieving Linear Speedup and Near-Optimal Complexity for Decentralized Optimization over Row-stochastic Networks
Authors:
Liyuan Liang,
Xinyi Chen,
Gan Luo,
Kun Yuan
Abstract:
A key challenge in decentralized optimization is determining the optimal convergence rate and designing algorithms to achieve it. While this problem has been extensively addressed for doubly-stochastic and column-stochastic mixing matrices, the row-stochastic scenario remains unexplored. This paper bridges this gap by introducing effective metrics to capture the influence of row-stochastic mixing…
▽ More
A key challenge in decentralized optimization is determining the optimal convergence rate and designing algorithms to achieve it. While this problem has been extensively addressed for doubly-stochastic and column-stochastic mixing matrices, the row-stochastic scenario remains unexplored. This paper bridges this gap by introducing effective metrics to capture the influence of row-stochastic mixing matrices and establishing the first convergence lower bound for decentralized learning over row-stochastic networks. However, existing algorithms fail to attain this lower bound due to two key issues: deviation in the descent direction caused by the adapted gradient tracking (GT) and instability introduced by the Pull-Diag protocol. To address descent deviation, we propose a novel analysis framework demonstrating that Pull-Diag-GT achieves linear speedup, the first such result for row-stochastic decentralized optimization. Moreover, by incorporating a multi-step gossip (MG) protocol, we resolve the instability issue and attain the lower bound, achieving near-optimal complexity for decentralized optimization over row-stochastic networks.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
RIDEN pilot survey: broad-band selection of candidate quasars with extended Lyman-$α$ nebulae using CLAUDS-HSC-SSP-DUNES$^2$ joint data
Authors:
Rhythm Shimakawa,
Satoshi Kikuta,
Haruka Kusakabe,
Marcin Sawicki,
Yongming Liang,
Rieko Momose,
Stephen Gwyn,
Guillaume Desprez
Abstract:
The Vera C. Rubin Observatory will conduct the Legacy Survey of Space and Time (LSST), delivering deep, multi-band ($ugrizy$) imaging data across 18,000 square degrees over the next decade. Before this ultra-wide-field survey, we constructed a broad-band Ly$α$ imaging toward 483 SDSS/BOSS quasars at $z=$ 1.9-3.0, using deep, wide-field ultraviolet to near-infrared ($u$-to-$K$) data from the Hyper…
▽ More
The Vera C. Rubin Observatory will conduct the Legacy Survey of Space and Time (LSST), delivering deep, multi-band ($ugrizy$) imaging data across 18,000 square degrees over the next decade. Before this ultra-wide-field survey, we constructed a broad-band Ly$α$ imaging toward 483 SDSS/BOSS quasars at $z=$ 1.9-3.0, using deep, wide-field ultraviolet to near-infrared ($u$-to-$K$) data from the Hyper Suprime-Cam Subaru Strategic Survey (HSC-SSP), the CFHT Large Area U-band Deep Survey (CLAUDS), the Deep UKIRT Near-Infrared Steward Survey (DUNES$^2$), and additional public data covering 13 square degrees. Our broad-band selection allowed us to select 24 candidate quasar nebulae that exhibit $u$ or $g$ band excess over 50-170 kpc, some of which exhibit asymmetrical extended features similar to those seen in previously discovered giant nebulae. We then investigated whether the Ly$α$ morphology of quasar nebulae differs between two redshift intervals, $z=$ 1.9-2.3 and $z=$ 2.3-3.0, and examined environmental dependence based on a control sample. Comparison results show no significant difference in asymmetry within Ly$α$ nebulae between the two redshift intervals. Furthermore, we found no systematic differences in overdensities around the complete quasar samples, quasars with large Ly$α$ nebulae, and control samples, while the most extended nebula appears to be located in the high-density region. Further verification analyses are required since the current dataset lacks spectroscopic confirmation for both quasar nebulae and their surrounding neighbours. Nevertheless, the results demonstrate the great potential of the Rubin LSST to discover giant Ly$α$ nebulae on an unprecedented scale.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
BESA: Boosting Encoder Stealing Attack with Perturbation Recovery
Authors:
Xuhao Ren,
Haotian Liang,
Yajie Wang,
Chuan Zhang,
Zehui Xiong,
Liehuang Zhu
Abstract:
To boost the encoder stealing attack under the perturbation-based defense that hinders the attack performance, we propose a boosting encoder stealing attack with perturbation recovery named BESA. It aims to overcome perturbation-based defenses. The core of BESA consists of two modules: perturbation detection and perturbation recovery, which can be combined with canonical encoder stealing attacks.…
▽ More
To boost the encoder stealing attack under the perturbation-based defense that hinders the attack performance, we propose a boosting encoder stealing attack with perturbation recovery named BESA. It aims to overcome perturbation-based defenses. The core of BESA consists of two modules: perturbation detection and perturbation recovery, which can be combined with canonical encoder stealing attacks. The perturbation detection module utilizes the feature vectors obtained from the target encoder to infer the defense mechanism employed by the service provider. Once the defense mechanism is detected, the perturbation recovery module leverages the well-designed generative model to restore a clean feature vector from the perturbed one. Through extensive evaluations based on various datasets, we demonstrate that BESA significantly enhances the surrogate encoder accuracy of existing encoder stealing attacks by up to 24.63\% when facing state-of-the-art defenses and combinations of multiple defenses.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation
Authors:
Kun Zhao,
Bohao Yang,
Chen Tang,
Siyuan Dai,
Haoteng Tang,
Chenghua Lin,
Liang Zhan
Abstract:
Large Language Models (LLMs) excel at many tasks but struggle with ambiguous scenarios where multiple valid responses exist, often yielding unreliable results. Conversely, Small Language Models (SLMs) demonstrate robustness in such scenarios but are susceptible to misleading or adversarial inputs. We observed that LLMs handle negative examples effectively, while SLMs excel with positive examples.…
▽ More
Large Language Models (LLMs) excel at many tasks but struggle with ambiguous scenarios where multiple valid responses exist, often yielding unreliable results. Conversely, Small Language Models (SLMs) demonstrate robustness in such scenarios but are susceptible to misleading or adversarial inputs. We observed that LLMs handle negative examples effectively, while SLMs excel with positive examples. To leverage their complementary strengths, we introduce SLIDE (Small and Large Integrated for Dialogue Evaluation), a method integrating SLMs and LLMs via adaptive weighting. Building on SLIDE, we further propose a Dual-Refinement Evaluation (DRE) method to enhance SLM-LLM integration: (1) SLM-generated insights guide the LLM to produce initial evaluations; (2) SLM-derived adjustments refine the LLM's scores for improved accuracy. Experiments demonstrate that DRE outperforms existing methods, showing stronger alignment with human judgment across diverse benchmarks. This work illustrates how combining small and large models can yield more reliable evaluation tools, particularly for open-ended tasks such as dialogue evaluation.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Emergent gravity and gravitational lensing in quantum materials
Authors:
Yugo Onishi,
Nisarga Paul,
Liang Fu
Abstract:
We show that an effective gravitational field naturally emerges in quantum materials with long-wavelength spin (or pseudospin) textures. When the itinerant electrons' spin strongly couples to the background spin texture, it effectively behaves as a spinless particle in a curved space, with the curvature arising from quantum corrections to the electron's spin orientation. The emergent gravity gives…
▽ More
We show that an effective gravitational field naturally emerges in quantum materials with long-wavelength spin (or pseudospin) textures. When the itinerant electrons' spin strongly couples to the background spin texture, it effectively behaves as a spinless particle in a curved space, with the curvature arising from quantum corrections to the electron's spin orientation. The emergent gravity gives rise to the electron lensing effect, an analog of the gravitational lensing. Our work shows that novel ``gravitational'' phenomena generically appear in quantum systems due to nonadiabaticity, opening new research directions in quantum physics.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Autonomous Collaborative Scheduling of Time-dependent UAVs, Workers and Vehicles for Crowdsensing in Disaster Response
Authors:
Lei Han,
Yitong Guo,
Pengfei Yang,
Zhiyong Yu,
Liang Wang,
Quan Wang,
Zhiwen Yu
Abstract:
Natural disasters have caused significant losses to human society, and the timely and efficient acquisition of post-disaster environmental information is crucial for the effective implementation of rescue operations. Due to the complexity of post-disaster environments, existing sensing technologies face challenges such as weak environmental adaptability, insufficient specialized sensing capabiliti…
▽ More
Natural disasters have caused significant losses to human society, and the timely and efficient acquisition of post-disaster environmental information is crucial for the effective implementation of rescue operations. Due to the complexity of post-disaster environments, existing sensing technologies face challenges such as weak environmental adaptability, insufficient specialized sensing capabilities, and limited practicality of sensing solutions. This paper explores the heterogeneous multi-agent online autonomous collaborative scheduling algorithm HoAs-PALN, aimed at achieving efficient collection of post-disaster environmental information. HoAs-PALN is realized through adaptive dimensionality reduction in the matching process and local Nash equilibrium game, facilitating autonomous collaboration among time-dependent UAVs, workers and vehicles to enhance sensing scheduling. (1) In terms of adaptive dimensionality reduction during the matching process, HoAs-PALN significantly reduces scheduling decision time by transforming a five-dimensional matching process into two categories of three-dimensional matching processes; (2) Regarding the local Nash equilibrium game, HoAs-PALN combines the softmax function to optimize behavior selection probabilities and introduces a local Nash equilibrium determination mechanism to ensure scheduling decision performance. Finally, we conducted detailed experiments based on extensive real-world and simulated data. Compared with the baselines (GREEDY, K-WTA, MADL and MARL), HoAs-PALN improves task completion rates by 64.12%, 46.48%, 16.55%, and 14.03% on average, respectively, while each online scheduling decision takes less than 10 seconds, demonstrating its effectiveness in dynamic post-disaster environments.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Observation of Coherent Perfect Acoustic Absorption at an Exceptional Point
Authors:
Yi-Fei Xia,
Zi-Xiang Xu,
Yu-Ting Yan,
An Chen,
Jing Yang,
Bin Liang,
Jian-Chun Cheng,
Johan Christensen
Abstract:
Non-Hermitian systems have recently shown new possibilities to manipulate wave scattering by exploiting loss, yet coherent perfect absorption at an exceptional point (CPA EP) remains elusive in acoustics. Here we demonstrate it based on a two-channel waveguide with compact lossy resonators. We realize imbalanced losses crucial for CPA EP by using active components to independently modulate the non…
▽ More
Non-Hermitian systems have recently shown new possibilities to manipulate wave scattering by exploiting loss, yet coherent perfect absorption at an exceptional point (CPA EP) remains elusive in acoustics. Here we demonstrate it based on a two-channel waveguide with compact lossy resonators. We realize imbalanced losses crucial for CPA EP by using active components to independently modulate the non-Hermiticity. The CPA EP experimentally manifests as full absorption at a unique real frequency and shows high sensitivity to the incident phase variations.Our findings open an avenue to explore novel non-Hermitian physics for classical waves and develop innovative acoustic singularity-based devices.
△ Less
Submitted 19 May, 2025;
originally announced June 2025.
-
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
Authors:
Junting Chen,
Haotian Liang,
Lingxiao Du,
Weiyun Wang,
Mengkang Hu,
Yao Mu,
Wenhai Wang,
Jifeng Dai,
Ping Luo,
Wenqi Shao,
Lin Shao
Abstract:
The rapid progress of navigation, manipulation, and vision models has made mobile manipulators capable in many specialized tasks. However, the open-world mobile manipulation (OWMM) task remains a challenge due to the need for generalization to open-ended instructions and environments, as well as the systematic complexity to integrate high-level decision making with low-level robot control based on…
▽ More
The rapid progress of navigation, manipulation, and vision models has made mobile manipulators capable in many specialized tasks. However, the open-world mobile manipulation (OWMM) task remains a challenge due to the need for generalization to open-ended instructions and environments, as well as the systematic complexity to integrate high-level decision making with low-level robot control based on both global scene understanding and current agent state. To address this complexity, we propose a novel multi-modal agent architecture that maintains multi-view scene frames and agent states for decision-making and controls the robot by function calling. A second challenge is the hallucination from domain shift. To enhance the agent performance, we further introduce an agentic data synthesis pipeline for the OWMM task to adapt the VLM model to our task domain with instruction fine-tuning. We highlight our fine-tuned OWMM-VLM as the first dedicated foundation model for mobile manipulators with global scene understanding, robot state tracking, and multi-modal action generation in a unified model. Through experiments, we demonstrate that our model achieves SOTA performance compared to other foundation models including GPT-4o and strong zero-shot generalization in real world. The project page is at https://github.com/HHYHRHY/OWMM-Agent
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Interplay between ultrafast electronic and librational dynamics in liquid nitrobenzene probed with two-color four-wave mixing
Authors:
Niranjan Shivaram,
Richard Thurston,
Ali Belkacem,
Thorsten Weber,
Liang Z. Tan,
Daniel S. Slaughter
Abstract:
We present an experimental and theoretical study of the interplay between ultrafast electron dynamics and librational dynamics in liquid nitrobenzene. A femtosecond ultraviolet pulse and two femtosecond near infrared pulses interact with nitrobenzene molecules, generating a four-wave mixing nonlinear signal that is measured in the Optical Kerr Effect geometry. The near infrared nonlinear signal is…
▽ More
We present an experimental and theoretical study of the interplay between ultrafast electron dynamics and librational dynamics in liquid nitrobenzene. A femtosecond ultraviolet pulse and two femtosecond near infrared pulses interact with nitrobenzene molecules, generating a four-wave mixing nonlinear signal that is measured in the Optical Kerr Effect geometry. The near infrared nonlinear signal is measured to be non-zero only at negative time delays, corresponding to the near infrared pulses arriving earlier than the ultraviolet pulse. We perform time-dependent Quantum Master Equation calculations, which include a classical libration model, to simulate the experiment. The simulations support the conclusion that the near infrared pulses launch librational motion, while simultaneously creating electronic coherences that result in a libration-modulated electronic nonlinear response. Furthermore, we conclude that the measured nonlinear optical signal corresponds to a non-parametric process that leaves the molecules in an excited electronic state. This work provides new insight into ultrafast nonlinear optical interactions in liquids and is an important step towards probing ultrafast electronic coherences in large molecules in the liquid phase.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning
Authors:
Muling Wu,
Qi Qian,
Wenhao Liu,
Xiaohua Wang,
Zisu Huang,
Di Liang,
LI Miao,
Shihan Dou,
Changze Lv,
Zhenghua Wang,
Zhibo Xu,
Lina Chen,
Tianlong Li,
Xiaoqing Zheng,
Xuanjing Huang
Abstract:
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing. To address these limitations, we propose Customized Curriculum Learning (CCL), a novel framework with two key innovations. First, we introduce model-adaptive difficulty definition that cust…
▽ More
Large Language Models (LLMs) have achieved remarkable performance across various reasoning tasks, yet post-training is constrained by inefficient sample utilization and inflexible difficulty samples processing. To address these limitations, we propose Customized Curriculum Learning (CCL), a novel framework with two key innovations. First, we introduce model-adaptive difficulty definition that customizes curriculum datasets based on each model's individual capabilities rather than using predefined difficulty metrics. Second, we develop "Guided Prompting," which dynamically reduces sample difficulty through strategic hints, enabling effective utilization of challenging samples that would otherwise degrade performance. Comprehensive experiments on supervised fine-tuning and reinforcement learning demonstrate that CCL significantly outperforms uniform training approaches across five mathematical reasoning benchmarks, confirming its effectiveness across both paradigms in enhancing sample utilization and model performance.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems
Authors:
Tiehua Mei,
Hengrui Chen,
Peng Yu,
Jiaqing Liang,
Deqing Yang
Abstract:
Although large language models (LLMs) have shown great potential in recommender systems, the prohibitive computational costs for fine-tuning LLMs on entire datasets hinder their successful deployment in real-world scenarios. To develop affordable and effective LLM-based recommender systems, we focus on the task of coreset selection which identifies a small subset of fine-tuning data to optimize th…
▽ More
Although large language models (LLMs) have shown great potential in recommender systems, the prohibitive computational costs for fine-tuning LLMs on entire datasets hinder their successful deployment in real-world scenarios. To develop affordable and effective LLM-based recommender systems, we focus on the task of coreset selection which identifies a small subset of fine-tuning data to optimize the test loss, thereby facilitating efficient LLMs' fine-tuning. Although there exist some intuitive solutions of subset selection, including distribution-based and importance-based approaches, they often lead to suboptimal performance due to the misalignment with downstream fine-tuning objectives or weak generalization ability caused by individual-level sample selection. To overcome these challenges, we propose GORACS, which is a novel Group-level Optimal tRAnsport-guided Coreset Selection framework for LLM-based recommender systems. GORACS is designed based on two key principles for coreset selection: 1) selecting the subsets that minimize the test loss to align with fine-tuning objectives, and 2) enhancing model generalization through group-level data selection. Corresponding to these two principles, GORACS has two key components: 1) a Proxy Optimization Objective (POO) leveraging optimal transport and gradient information to bound the intractable test loss, thus reducing computational costs by avoiding repeated LLM retraining, and 2) a two-stage Initialization-Then-Refinement Algorithm (ITRA) for efficient group-level selection. Our extensive experiments across diverse recommendation datasets and tasks validate that GORACS significantly reduces fine-tuning costs of LLMs while achieving superior performance over the state-of-the-art baselines and full data training. The source code of GORACS are available at https://github.com/Mithas-114/GORACS.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
The equivalent condition for GRL codes to be MDS, AMDS or self-dual
Authors:
Zhonghao Liang,
Qunying Liao
Abstract:
It is well-known that MDS, AMDS or self-dual codes have good algebraic properties, and are applied in communication systems, data storage, quantum codes, and so on. In this paper, we focus on a class of generalized Roth-Lempel linear codes, and give an equivalent condition for them or their dual to be non-RS MDS, AMDS or non-RS self-dual and some corresponding examples.
It is well-known that MDS, AMDS or self-dual codes have good algebraic properties, and are applied in communication systems, data storage, quantum codes, and so on. In this paper, we focus on a class of generalized Roth-Lempel linear codes, and give an equivalent condition for them or their dual to be non-RS MDS, AMDS or non-RS self-dual and some corresponding examples.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
Authors:
Liang Chen,
Xueting Han,
Li Shen,
Jing Bai,
Kam-Fai Wong
Abstract:
Harmful fine-tuning (HFT), performed directly on open-source LLMs or through Fine-tuning-as-a-Service, breaks safety alignment and poses significant threats. Existing methods aim to mitigate HFT risks by learning robust representation on alignment data or making harmful data unlearnable, but they treat each data sample equally, leaving data vulnerability patterns understudied. In this work, we rev…
▽ More
Harmful fine-tuning (HFT), performed directly on open-source LLMs or through Fine-tuning-as-a-Service, breaks safety alignment and poses significant threats. Existing methods aim to mitigate HFT risks by learning robust representation on alignment data or making harmful data unlearnable, but they treat each data sample equally, leaving data vulnerability patterns understudied. In this work, we reveal that certain subsets of alignment data are consistently more prone to forgetting during HFT across different fine-tuning tasks. Inspired by these findings, we propose Vulnerability-Aware Alignment (VAA), which estimates data vulnerability, partitions data into "vulnerable" and "invulnerable" groups, and encourages balanced learning using a group distributionally robust optimization (Group DRO) framework. Specifically, VAA learns an adversarial sampler that samples examples from the currently underperforming group and then applies group-dependent adversarial perturbations to the data during training, aiming to encourage a balanced learning process across groups. Experiments across four fine-tuning tasks demonstrate that VAA significantly reduces harmful scores while preserving downstream task performance, outperforming state-of-the-art baselines.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Fast Non-Line-of-Sight Transient Data Simulation and an Open Benchmark Dataset
Authors:
Yingjie Shi,
Jinye Miao,
Taotao Qin,
Fuyao Cai,
Yi Wei,
Lingfeng Liu,
Tongyao Li,
Chenyang Wu,
Huan Liang,
Yuyang Yin,
Lianfa Bai,
Enlai Guo,
Jing Han
Abstract:
Non-Line-of-Sight (NLOS) imaging reconstructs the shape and depth of hidden objects from picosecond-resolved transient signals, offering potential applications in autonomous driving, security, and medical diagnostics. However, current NLOS experiments rely on expensive hardware and complex system alignment, limiting their scalability. This manuscript presents a simplified simulation method that ge…
▽ More
Non-Line-of-Sight (NLOS) imaging reconstructs the shape and depth of hidden objects from picosecond-resolved transient signals, offering potential applications in autonomous driving, security, and medical diagnostics. However, current NLOS experiments rely on expensive hardware and complex system alignment, limiting their scalability. This manuscript presents a simplified simulation method that generates NLOS transient data by modeling light-intensity transport rather than performing conventional path tracing, significantly enhancing computational efficiency. All scene elements, including the relay surface, hidden target, stand-off distance, detector time resolution, and acquisition window are fully parameterized, allowing for rapid configuration of test scenarios. Reconstructions based on the simulated data accurately recover hidden geometries, validating the effectiveness of the approach. The proposed tool reduces the entry barrier for NLOS research and supports the optimization of system design.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Uncertainty principles for free metaplectic transformation and associated metaplectic operators
Authors:
Ping Liang,
Pei Dang,
Weixiong Mai
Abstract:
In this paper, we systematically investigate the Heisenberg-Pauli-Weyl uncertainty principle for free metaplectic transformation, as well as metaplectic operators. Specifically, we obtain two different types of the uncertainty principle for free metaplectic transformations in terms of the so-called phase derivative, one of which can be generalized to the $L^p$-case with $1\le p\le 2$. The obtained…
▽ More
In this paper, we systematically investigate the Heisenberg-Pauli-Weyl uncertainty principle for free metaplectic transformation, as well as metaplectic operators. Specifically, we obtain two different types of the uncertainty principle for free metaplectic transformations in terms of the so-called phase derivative, one of which can be generalized to the $L^p$-case with $1\le p\le 2$. The obtained results are valid not only for free metaplectic transformations but also for general metaplectic operators. In particular, we point out that our results are closely related to those given in \cite{Dias-deGosson-Prata}, and the relationship should be new and not exactly given in the existing literature.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Authors:
Shuai Liu,
Mingyue Cui,
Boyang Li,
Quanmin Liang,
Tinghe Hong,
Kai Huang,
Yunxiao Shan,
Kai Huang
Abstract:
Fully sparse 3D detectors have recently gained significant attention due to their efficiency in long-range detection. However, sparse 3D detectors extract features only from non-empty voxels, which impairs long-range interactions and causes the center feature missing. The former weakens the feature extraction capability, while the latter hinders network optimization. To address these challenges, w…
▽ More
Fully sparse 3D detectors have recently gained significant attention due to their efficiency in long-range detection. However, sparse 3D detectors extract features only from non-empty voxels, which impairs long-range interactions and causes the center feature missing. The former weakens the feature extraction capability, while the latter hinders network optimization. To address these challenges, we introduce the Fully Sparse Hybrid Network (FSHNet). FSHNet incorporates a proposed SlotFormer block to enhance the long-range feature extraction capability of existing sparse encoders. The SlotFormer divides sparse voxels using a slot partition approach, which, compared to traditional window partition, provides a larger receptive field. Additionally, we propose a dynamic sparse label assignment strategy to deeply optimize the network by providing more high-quality positive samples. To further enhance performance, we introduce a sparse upsampling module to refine downsampled voxels, preserving fine-grained details crucial for detecting small objects. Extensive experiments on the Waymo, nuScenes, and Argoverse2 benchmarks demonstrate the effectiveness of FSHNet. The code is available at https://github.com/Say2L/FSHNet.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Advancements in Artificial Intelligence Applications for Cardiovascular Disease Research
Authors:
Yuanlin Mo,
Haishan Huang,
Bocheng Liang,
Weibo Ma
Abstract:
Recent advancements in artificial intelligence (AI) have revolutionized cardiovascular medicine, particularly through integration with computed tomography (CT), magnetic resonance imaging (MRI), electrocardiography (ECG) and ultrasound (US). Deep learning architectures, including convolutional neural networks and generative adversarial networks, enable automated analysis of medical imaging and phy…
▽ More
Recent advancements in artificial intelligence (AI) have revolutionized cardiovascular medicine, particularly through integration with computed tomography (CT), magnetic resonance imaging (MRI), electrocardiography (ECG) and ultrasound (US). Deep learning architectures, including convolutional neural networks and generative adversarial networks, enable automated analysis of medical imaging and physiological signals, surpassing human capabilities in diagnostic accuracy and workflow efficiency. However, critical challenges persist, including the inability to validate input data accuracy, which may propagate diagnostic errors. This review highlights AI's transformative potential in precision diagnostics while underscoring the need for robust validation protocols to ensure clinical reliability. Future directions emphasize hybrid models integrating multimodal data and adaptive algorithms to refine personalized cardiovascular care.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Images are Worth Variable Length of Representations
Authors:
Lingjun Mao,
Rodolfo Corona,
Xin Liang,
Wenhao Yan,
Zineng Tang
Abstract:
Most existing vision encoders map images into a fixed-length sequence of tokens, overlooking the fact that different images contain varying amounts of information. For example, a visually complex image (e.g., a cluttered room) inherently carries more information and thus deserves more tokens than a simple image (e.g., a blank wall). To address this inefficiency, we propose DOVE, a dynamic vision e…
▽ More
Most existing vision encoders map images into a fixed-length sequence of tokens, overlooking the fact that different images contain varying amounts of information. For example, a visually complex image (e.g., a cluttered room) inherently carries more information and thus deserves more tokens than a simple image (e.g., a blank wall). To address this inefficiency, we propose DOVE, a dynamic vision encoder that produces a variable number of visual tokens (i.e., continuous representation vectors) to reconstruct each image. Our results show that DOVE significantly reduces the average number of tokens while maintaining high reconstruction quality. In several linear probing and downstream multimodal tasks, it outperforms existing autoencoder-based tokenization methods when using far fewer tokens, capturing more expressive semantic features compared to fixed-length encoding. We further extend DOVE with query-conditioned tokenization. By guiding the model to focus on query-relevant regions, it achieves more efficient and targeted semantic extraction. Our code and checkpoints are available at https://dove-encoder.github.io/dove-encoder.
△ Less
Submitted 5 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
Beamforming and Resource Allocation for Delay Optimization in RIS-Assisted OFDM Systems
Authors:
Yu Ma,
Xiao Li,
Chongtao Guo,
Le Liang,
Shi Jin
Abstract:
This paper investigates a joint phase design and resource allocation problem in downlink reconfigurable intelligent surface (RIS)-assisted orthogonal frequency division multiplexing (OFDM) systems to optimize average delay, where data packets for each user arrive at the base station stochastically. The sequential optimization problem is inherently a Markov decision process (MDP), making it fall wi…
▽ More
This paper investigates a joint phase design and resource allocation problem in downlink reconfigurable intelligent surface (RIS)-assisted orthogonal frequency division multiplexing (OFDM) systems to optimize average delay, where data packets for each user arrive at the base station stochastically. The sequential optimization problem is inherently a Markov decision process (MDP), making it fall within the scope of reinforcement learning. To effectively handle the mixed action space and reduce the state space dimensionality, a hybrid deep reinforcement learning (DRL) approach is proposed. Specifically, proximal policy optimization (PPO)-$Θ$ is employed to optimize RIS phase shift design, while PPO-N is responsible for subcarrier allocation decisions. To further mitigate the curse of dimensionality associated with subcarrier allocation, a multi-agent strategy is introduced to optimize subcarrier allocation indicater more efficiently. Moreover, to achieve more adaptive resource allocation and accurately capture network dynamics, key factors closely related to average delay, including the number of backlogged packets in buffers and the current packet arrivals, are incorporated into the state space. Furthermore, a transfer learning framework is introduced to enhance training efficiency and accelerate convergence. Simulation results demonstrate that the proposed algorithm significantly reduces average delay, enhances resource allocation efficiency, and achieves superior system robustness and fairness compared to baseline methods.
△ Less
Submitted 12 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
MiMo-VL Technical Report
Authors:
Xiaomi LLM-Core Team,
:,
Zihao Yue,
Zhenru Lin,
Yifan Song,
Weikun Wang,
Shuhuai Ren,
Shuhao Gu,
Shicheng Li,
Peidian Li,
Liang Zhao,
Lei Li,
Kainan Bao,
Hao Tian,
Hailin Zhang,
Gang Wang,
Dawei Zhu,
Cici,
Chenhong He,
Bowen Ye,
Bowen Shen,
Zihan Zhang,
Zihan Jiang,
Zhixian Zheng,
Zhichao Song
, et al. (50 additional authors not shown)
Abstract:
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with…
▽ More
We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-VL.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Seed-Coder: Let the Code Model Curate Data for Itself
Authors:
ByteDance Seed,
Yuyu Zhang,
Jing Su,
Yifan Sun,
Chenguang Xi,
Xia Xiao,
Shen Zheng,
Anxiang Zhang,
Kaibo Liu,
Daoguang Zan,
Tao Sun,
Jinhua Zhu,
Shulin Xin,
Dong Huang,
Yetao Bai,
Lixin Dong,
Chao Li,
Jianchong Chen,
Hanzhi Zhou,
Yifan Huang,
Guanghan Ning,
Xierui Song,
Jiaze Chen,
Siyao Liu,
Kai Shen
, et al. (2 additional authors not shown)
Abstract:
Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code pretraining data, such as employing hand-crafted filtering rules tailored to individual programming languages, or using human-annotated data to train quality f…
▽ More
Code data in large language model (LLM) pretraining is recognized crucial not only for code-related tasks but also for enhancing general intelligence of LLMs. Current open-source LLMs often heavily rely on human effort to produce their code pretraining data, such as employing hand-crafted filtering rules tailored to individual programming languages, or using human-annotated data to train quality filters. However, these approaches are inherently limited in scalability, prone to subjective biases, and costly to extend and maintain across diverse programming languages. To address these challenges, we introduce Seed-Coder, a series of open-source LLMs comprising base, instruct and reasoning models of 8B size, minimizing human involvement in data construction. Our code pretraining data is produced by a model-centric data pipeline, which predominantly leverages LLMs for scoring and filtering code data. The instruct model is further trained via supervised fine-tuning and preference optimization, and the reasoning model leverages Long-Chain-of-Thought (LongCoT) reinforcement learning to improve multi-step code reasoning. Seed-Coder achieves state-of-the-art results among open-source models of similar size and even surpasses some much larger models, demonstrating superior performance in code generation, code completion, code editing, code reasoning, and software engineering tasks.
△ Less
Submitted 4 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
Authors:
Baode Wang,
Biao Wu,
Weizhen Li,
Meng Fang,
Yanjie Liang,
Zuming Huang,
Haozhe Wang,
Jun Huang,
Ling Chen,
Wei Chu,
Yuan Qi
Abstract:
Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse layouts. We introduce layoutRL, an end-to-end reinforcement learning framework that trains models to be explicitly layout-aware by optimizing a composite reward of…
▽ More
Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse layouts. We introduce layoutRL, an end-to-end reinforcement learning framework that trains models to be explicitly layout-aware by optimizing a composite reward of normalized edit distance, paragraph count accuracy, and reading order preservation. Leveraging our newly released dataset, Infinity-Doc-55K, which combines 55K high-fidelity synthetic scanned document parsing data with expert-filtered real-world documents, we instantiate layoutRL in a vision-language-model-based parser called Infinity-Parser. Evaluated on English and Chinese benchmarks for OCR, table and formula extraction, and reading order detection, Infinity-Parser achieves new state-of-the-art performance in both accuracy and structural fidelity, outpacing specialist pipelines and general-purpose vision-language models. We will publicly release our code and dataset to accelerate progress in robust document understanding.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Authors:
Kai Lion,
Liang Zhang,
Bingcong Li,
Niao He
Abstract:
We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stief…
▽ More
We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on three different benchmarks testing general language understanding, commonsense reasoning, and mathematical problem solving with base model sizes ranging from 350M to 27B.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds
Authors:
Yang Guo,
Yutian Tao,
Yifei Ming,
Robert D. Nowak,
Yingyu Liang
Abstract:
Retrieval-augmented generation (RAG) has seen many empirical successes in recent years by aiding the LLM with external knowledge. However, its theoretical aspect has remained mostly unexplored. In this paper, we propose the first finite-sample generalization bound for RAG in in-context linear regression and derive an exact bias-variance tradeoff. Our framework views the retrieved texts as query-de…
▽ More
Retrieval-augmented generation (RAG) has seen many empirical successes in recent years by aiding the LLM with external knowledge. However, its theoretical aspect has remained mostly unexplored. In this paper, we propose the first finite-sample generalization bound for RAG in in-context linear regression and derive an exact bias-variance tradeoff. Our framework views the retrieved texts as query-dependent noisy in-context examples and recovers the classical in-context learning (ICL) and standard RAG as the limit cases. Our analysis suggests that an intrinsic ceiling on generalization error exists on RAG as opposed to the ICL. Furthermore, our framework is able to model retrieval both from the training data and from external corpora by introducing uniform and non-uniform RAG noise. In line with our theory, we show the sample efficiency of ICL and RAG empirically with experiments on common QA benchmarks, such as Natural Questions and TriviaQA.
△ Less
Submitted 9 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Three-pion Bose-Einstein correlations measured in proton-proton collisions
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An
, et al. (1125 additional authors not shown)
Abstract:
A study on the Bose-Einstein correlations for triplets of same-sign pions is presented. The analysis is performed using proton-proton collisions at a centre-of-mass energy of $\sqrt{s}$ = 7 TeV, recorded by the LHCb experiment, corresponding to an integrated luminosity of 1.0 fb$^{-1}$. For the first time, the results are interpreted in the core-halo model. The parameters of the model are determin…
▽ More
A study on the Bose-Einstein correlations for triplets of same-sign pions is presented. The analysis is performed using proton-proton collisions at a centre-of-mass energy of $\sqrt{s}$ = 7 TeV, recorded by the LHCb experiment, corresponding to an integrated luminosity of 1.0 fb$^{-1}$. For the first time, the results are interpreted in the core-halo model. The parameters of the model are determined in regions of charged-particle multiplicity. This measurement provides insight into the nature of hadronisation in terms of coherence, showing a coherent emission of pions.
△ Less
Submitted 9 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Causal Explainability of Machine Learning in Heart Failure Prediction from Electronic Health Records
Authors:
Yina Hou,
Shourav B. Rabbani,
Liang Hong,
Norou Diawara,
Manar D. Samad
Abstract:
The importance of clinical variables in the prognosis of the disease is explained using statistical correlation or machine learning (ML). However, the predictive importance of these variables may not represent their causal relationships with diseases. This paper uses clinical variables from a heart failure (HF) patient cohort to investigate the causal explainability of important variables obtained…
▽ More
The importance of clinical variables in the prognosis of the disease is explained using statistical correlation or machine learning (ML). However, the predictive importance of these variables may not represent their causal relationships with diseases. This paper uses clinical variables from a heart failure (HF) patient cohort to investigate the causal explainability of important variables obtained in statistical and ML contexts. Due to inherent regression modeling, popular causal discovery methods strictly assume that the cause and effect variables are numerical and continuous. This paper proposes a new computational framework to enable causal structure discovery (CSD) and score the causal strength of mixed-type (categorical, numerical, binary) clinical variables for binary disease outcomes. In HF classification, we investigate the association between the importance rank order of three feature types: correlated features, features important for ML predictions, and causal features. Our results demonstrate that CSD modeling for nonlinear causal relationships is more meaningful than its linear counterparts. Feature importance obtained from nonlinear classifiers (e.g., gradient-boosting trees) strongly correlates with the causal strength of variables without differentiating cause and effect variables. Correlated variables can be causal for HF, but they are rarely identified as effect variables. These results can be used to add the causal explanation of variables important for ML-based prediction modeling.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Measurement of the branching fractions of the Cabibbo-favored decays $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ and $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ and search for $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (660 additional authors not shown)
Abstract:
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII…
▽ More
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII Collaboration, the branching fraction of the decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is calculated to be $(3.07\pm0.26\pm0.13)\times10^{-3}$. The decay $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ is observed for the first time with a statistical significance of $6.6σ$, and its branching fraction is determined to be $(3.70\pm0.60\pm0.21)\times10^{-3}$. In addition, a search for the decay $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$ is performed and its branching fraction is determined to be $(0.80^{+0.28}_{-0.24}\pm0.16)\times10^{-3}$, corresponding to an upper limit of $1.28\times10^{-3}$ at $90\%$ confidence level. These measurements provide new information that can be used to distinguish between theoretical models.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.