-
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
Authors:
Ai Jian,
Weijie Qiu,
Xiaokun Wang,
Peiyu Wang,
Yunzhuo Hao,
Jiangbo Pei,
Yichen Wei,
Yi Peng,
Xuchen Song
Abstract:
Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal understanding, yet their capabilities for scientific reasoning remains inadequately assessed. Current multimodal benchmarks predominantly evaluate generic image comprehension or text-driven reasoning, lacking authentic scientific contexts that require domain-specific knowledge integration with visual evidence analysi…
▽ More
Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal understanding, yet their capabilities for scientific reasoning remains inadequately assessed. Current multimodal benchmarks predominantly evaluate generic image comprehension or text-driven reasoning, lacking authentic scientific contexts that require domain-specific knowledge integration with visual evidence analysis. To fill this gap, we present CSVQA, a diagnostic multimodal benchmark specifically designed for evaluating scientific reasoning through domain-grounded visual question answering.Our benchmark features 1,378 carefully constructed question-answer pairs spanning diverse STEM disciplines, each demanding domain knowledge, integration of visual evidence, and higher-order reasoning. Compared to prior multimodal benchmarks, CSVQA places greater emphasis on real-world scientific content and complex reasoning.We additionally propose a rigorous evaluation protocol to systematically assess whether model predictions are substantiated by valid intermediate reasoning steps based on curated explanations. Our comprehensive evaluation of 15 VLMs on this benchmark reveals notable performance disparities, as even the top-ranked proprietary model attains only 49.6\% accuracy.This empirical evidence underscores the pressing need for advancing scientific reasoning capabilities in VLMs. Our CSVQA is released at https://huggingface.co/datasets/Skywork/CSVQA.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging
Authors:
Ping Wang,
Lishun Wang,
Gang Qu,
Xiaodong Wang,
Yulun Zhang,
Xin Yuan
Abstract:
Deep-unrolling and plug-and-play (PnP) approaches have become the de-facto standard solvers for single-pixel imaging (SPI) inverse problem. PnP approaches, a class of iterative algorithms where regularization is implicitly performed by an off-the-shelf deep denoiser, are flexible for varying compression ratios (CRs) but are limited in reconstruction accuracy and speed. Conversely, unrolling approa…
▽ More
Deep-unrolling and plug-and-play (PnP) approaches have become the de-facto standard solvers for single-pixel imaging (SPI) inverse problem. PnP approaches, a class of iterative algorithms where regularization is implicitly performed by an off-the-shelf deep denoiser, are flexible for varying compression ratios (CRs) but are limited in reconstruction accuracy and speed. Conversely, unrolling approaches, a class of multi-stage neural networks where a truncated iterative optimization process is transformed into an end-to-end trainable network, typically achieve better accuracy with faster inference but require fine-tuning or even retraining when CR changes. In this paper, we address the challenge of integrating the strengths of both classes of solvers. To this end, we design an efficient deep image restorer (DIR) for the unrolling of HQS (half quadratic splitting) and ADMM (alternating direction method of multipliers). More importantly, a general proximal trajectory (PT) loss function is proposed to train HQS/ADMM-unrolling networks such that learned DIR approximates the proximal operator of an ideal explicit restoration regularizer. Extensive experiments demonstrate that, the resulting proximal unrolling networks can not only flexibly handle varying CRs with a single model like PnP algorithms, but also outperform previous CR-specific unrolling networks in both reconstruction accuracy and speed. Source codes and models are available at https://github.com/pwangcs/ProxUnroll.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Early Assessment of Artificial Lower Extremity Sensory Response Times and Proprioceptive Acuity via Sensory Cortex Electrical Stimulation
Authors:
Won Joon Sohn,
Jeffrey Lim,
Po T. Wang,
Susan J. Shaw,
Michelle Armacost,
Hui Gong,
Brian Lee,
Darrin Lee,
Payam Heydari,
Richard A. Andersen,
Charles Y. Liu,
Zoran Nenadic,
An H. Do
Abstract:
Bi-directional brain computer interfaces (BD-BCIs) may restore brain-controlled walking and artificial leg sensation after spinal cord injury. Current BD-BCIs provide only simplistic "tingling" feedback, which lacks proprioceptive information to perceive critical gait events (leg swing, double support). This information must also be perceived adequately fast to facilitate timely motor responses. H…
▽ More
Bi-directional brain computer interfaces (BD-BCIs) may restore brain-controlled walking and artificial leg sensation after spinal cord injury. Current BD-BCIs provide only simplistic "tingling" feedback, which lacks proprioceptive information to perceive critical gait events (leg swing, double support). This information must also be perceived adequately fast to facilitate timely motor responses. Here, we investigated utilizing primary sensory cortex (S1) direct cortical electrical stimulation (DCES) to deliver leg proprioceptive information and measured response times to artificial leg sensations. Subjects with subdural electrocorticogram electrodes over S1 leg areas participated in two tasks: (1) Proprioceptive acuity: subjects identified the difference between DCES-induced percepts emulating various leg swing speeds; (2) Sensory response: measuring subjects' reaction time to DCES-induced leg sensations, with DCES-hand, visual and auditory control conditions. Three subjects were recruited. Only one completed the proprioceptive assessment, achieving 80%, 70%, 60%, and 53% accuracy in discriminating between fast/slow, fast/medium, medium/slow, and same speeds, respectively (p-value=1.9x10$^{-5}$). Response times for leg/hand percepts were 1007$\pm$413/599$\pm$171 ms, visual leg/hand responses were 528$\pm$137/384$\pm$84 ms, and auditory leg/hand responses were 393$\pm$106/352$\pm$93 ms, respectively. These results suggest proprioceptive information can be delivered artificially, but perception may be significantly delayed. Future work should address improving acuity, reducing response times, and expanding sensory modalities.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Search for a dark baryon in the $Ξ^-\rightarrowπ^-+{\rm invisible}$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (697 additional authors not shown)
Abstract:
A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction…
▽ More
A search for a dark baryon is performed for the first time in the two-body decay $Ξ^-\rightarrowπ^-+{\rm invisible}$ using $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected at a center-of-mass energy of $\sqrt{s}=3.097\,\mbox{GeV}$ with the BESIII detector at the BEPCII collider. No significant signal is observed, and the 90% (95%) confidence level upper limits on the branching fraction $B(Ξ^-\rightarrowπ^-+{\rm invisible})$ are determined to be $4.2\times10^{-5}$ ($5.2\times10^{-5}$), $6.9\times10^{-5}$ ($8.4\times10^{-5}$), $6.5\times10^{-4}$ ($7.6\times10^{-4}$), $1.1\times10^{-4}$ ($1.3\times10^{-4}$) and $4.5\times10^{-5}$ ($5.5\times10^{-5}$), under the dark baryon mass hypotheses of 1.07$\,\mbox{GeV}/c^2$, 1.10$\,\mbox{GeV}/c^2$, $m_Λ$ (1.116$\,\mbox{GeV}/c^2$), 1.13$\,\mbox{GeV}/c^2$, and 1.16$\,\mbox{GeV}/c^2$, respectively. The constraints obtained on the Wilson coefficients $C_{u s, s}^L$ and $C_{u s, s}^R$ are more stringent than the previous limits derived from the LHC searches for the colored mediators.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry
Authors:
Peijie Wang,
Chao Yang,
Zhong-Zhi Li,
Fei Yin,
Dekang Ran,
Mi Tian,
Zhilong Ji,
Jinfeng Bai,
Cheng-Lin Liu
Abstract:
Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry. To address this critical gap, we introduce So…
▽ More
Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry. To address this critical gap, we introduce SolidGeo, the first large-scale benchmark specifically designed to evaluate the performance of MLLMs on mathematical reasoning tasks in solid geometry. SolidGeo consists of 3,113 real-world K-12 and competition-level problems, each paired with visual context and annotated with difficulty levels and fine-grained solid geometry categories. Our benchmark covers a wide range of 3D reasoning subjects such as projection, unfolding, spatial measurement, and spatial vector, offering a rigorous testbed for assessing solid geometry. Through extensive experiments, we observe that MLLMs encounter substantial challenges in solid geometry math tasks, with a considerable performance gap relative to human capabilities on SolidGeo. Moreover, we analyze the performance, inference efficiency and error patterns of various models, offering insights into the solid geometric mathematical reasoning capabilities of MLLMs. We hope SolidGeo serves as a catalyst for advancing MLLMs toward deeper geometric reasoning and spatial intelligence.
△ Less
Submitted 9 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles
Authors:
Peng Wang,
Xiang Liu,
Peidong Liu
Abstract:
Stylizing 3D scenes instantly while maintaining multi-view consistency and faithfully resembling a style image remains a significant challenge. Current state-of-the-art 3D stylization methods typically involve computationally intensive test-time optimization to transfer artistic features into a pretrained 3D representation, often requiring dense posed input images. In contrast, leveraging recent a…
▽ More
Stylizing 3D scenes instantly while maintaining multi-view consistency and faithfully resembling a style image remains a significant challenge. Current state-of-the-art 3D stylization methods typically involve computationally intensive test-time optimization to transfer artistic features into a pretrained 3D representation, often requiring dense posed input images. In contrast, leveraging recent advances in feed-forward reconstruction models, we demonstrate a novel approach to achieve direct 3D stylization in less than a second using unposed sparse-view scene images and an arbitrary style image. To address the inherent decoupling between reconstruction and stylization, we introduce a branched architecture that separates structure modeling and appearance shading, effectively preventing stylistic transfer from distorting the underlying 3D scene structure. Furthermore, we adapt an identity loss to facilitate pre-training our stylization model through the novel view synthesis task. This strategy also allows our model to retain its original reconstruction capabilities while being fine-tuned for stylization. Comprehensive evaluations, using both in-domain and out-of-domain datasets, demonstrate that our approach produces high-quality stylized 3D content that achieve a superior blend of style and scene appearance, while also outperforming existing methods in terms of multi-view consistency and efficiency.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Understanding Generalization in Diffusion Models via Probability Flow Distance
Authors:
Huijie Zhang,
Zijian Huang,
Siyi Chen,
Jinfan Zhou,
Zekai Zhang,
Peng Wang,
Qing Qu
Abstract:
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data. However, evaluating this generalization remains challenging: theoretical metrics are often impractical for high-dimensional data, while no practical metrics rigorously measure generalization. In this work, we bridge this gap by introducing prob…
▽ More
Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality samples that generalize beyond the training data. However, evaluating this generalization remains challenging: theoretical metrics are often impractical for high-dimensional data, while no practical metrics rigorously measure generalization. In this work, we bridge this gap by introducing probability flow distance ($\texttt{PFD}$), a theoretically grounded and computationally efficient metric to measure distributional generalization. Specifically, $\texttt{PFD}$ quantifies the distance between distributions by comparing their noise-to-data mappings induced by the probability flow ODE. Moreover, by using $\texttt{PFD}$ under a teacher-student evaluation protocol, we empirically uncover several key generalization behaviors in diffusion models, including: (1) scaling behavior from memorization to generalization, (2) early learning and double descent training dynamics, and (3) bias-variance decomposition. Beyond these insights, our work lays a foundation for future empirical and theoretical studies on generalization in diffusion models.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Spatiotemporal Causal Decoupling Model for Air Quality Forecasting
Authors:
Jiaming Ma,
Guanjun Wang,
Sheng Huang,
Kuo Yang,
Binwu Wang,
Pengkun Wang,
Yang Wang
Abstract:
Due to the profound impact of air pollution on human health, livelihoods, and economic development, air quality forecasting is of paramount significance. Initially, we employ the causal graph method to scrutinize the constraints of existing research in comprehensively modeling the causal relationships between the air quality index (AQI) and meteorological features. In order to enhance prediction a…
▽ More
Due to the profound impact of air pollution on human health, livelihoods, and economic development, air quality forecasting is of paramount significance. Initially, we employ the causal graph method to scrutinize the constraints of existing research in comprehensively modeling the causal relationships between the air quality index (AQI) and meteorological features. In order to enhance prediction accuracy, we introduce a novel air quality forecasting model, AirCade, which incorporates a causal decoupling approach. AirCade leverages a spatiotemporal module in conjunction with knowledge embedding techniques to capture the internal dynamics of AQI. Subsequently, a causal decoupling module is proposed to disentangle synchronous causality from past AQI and meteorological features, followed by the dissemination of acquired knowledge to future time steps to enhance performance. Additionally, we introduce a causal intervention mechanism to explicitly represent the uncertainty of future meteorological features, thereby bolstering the model's robustness. Our evaluation of AirCade on an open-source air quality dataset demonstrates over 20\% relative improvement over state-of-the-art models.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Authors:
Wenrui Li,
Penghong Wang,
Xingtao Wang,
Wangmeng Zuo,
Xiaopeng Fan,
Yonghong Tian
Abstract:
Audio-visual zero-shot learning (ZSL) has been extensively researched for its capability to classify video data from unseen classes during training. Nevertheless, current methodologies often struggle with background scene biases and inadequate motion detail. This paper proposes a novel dual-stream Multi-Timescale Motion-Decoupled Spiking Transformer (MDST++), which decouples contextual semantic in…
▽ More
Audio-visual zero-shot learning (ZSL) has been extensively researched for its capability to classify video data from unseen classes during training. Nevertheless, current methodologies often struggle with background scene biases and inadequate motion detail. This paper proposes a novel dual-stream Multi-Timescale Motion-Decoupled Spiking Transformer (MDST++), which decouples contextual semantic information and sparse dynamic motion information. The recurrent joint learning unit is proposed to extract contextual semantic information and capture joint knowledge across various modalities to understand the environment of actions. By converting RGB images to events, our method captures motion information more accurately and mitigates background scene biases. Moreover, we introduce a discrepancy analysis block to model audio motion information. To enhance the robustness of SNNs in extracting temporal and motion cues, we dynamically adjust the threshold of Leaky Integrate-and-Fire neurons based on global motion and contextual semantic information. Our experiments validate the effectiveness of MDST++, demonstrating their consistent superiority over state-of-the-art methods on mainstream benchmarks. Additionally, incorporating motion and multi-timescale information significantly improves HM and ZSL accuracy by 26.2\% and 39.9\%.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
First measurement of $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ cross-sections via $Σ^+$-nucleus scattering at an electron-positron collider
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals o…
▽ More
Using $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected with the BESIII detector at the BEPCII storage ring, the reactions $Σ^{+}n\rightarrowΛp$ and $Σ^{+}n\rightarrowΣ^{0}p$ are studied, where the $Σ^{+}$ baryon is produced in the process $J/ψ\rightarrowΣ^{+}\barΣ^-$ and the neutron is a component of the $^9\rm{Be}$, $^{12}\rm{C}$ and $^{197}\rm{Au}$ nuclei in the beam pipe. Clear signals of these two reactions are observed for the first time. Their cross-sections are measured to be $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΛ+p+{^8\rm{Be}})=(45.2\pm12.1_{\rm{stat}}\pm7.2_{\rm{sys}})$ mb and $σ(Σ^{+}+{^9\rm{Be}}\rightarrowΣ^{0}+p+{^8\rm{Be}})=(29.8\pm9.7_{\rm{stat}}\pm6.9_{\rm{sys}})$ mb for a $Σ^{+}$ average momentum of $0.992$ GeV/$c$, within a range of $\pm0.015$ GeV/$c$. This is the first study of $Σ^{+}$-nucleon scattering at an electron-positron collider.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Distilling Closed-Source LLM's Knowledge for Locally Stable and Economic Biomedical Entity Linking
Authors:
Yihao Ai,
Zhiyuan Ning,
Weiwei Dai,
Pengfei Wang,
Yi Du,
Wenjuan Cui,
Kunpeng Liu,
Yuanchun Zhou
Abstract:
Biomedical entity linking aims to map nonstandard entities to standard entities in a knowledge base. Traditional supervised methods perform well but require extensive annotated data to transfer, limiting their usage in low-resource scenarios. Large language models (LLMs), especially closed-source LLMs, can address these but risk stability issues and high economic costs: using these models is restr…
▽ More
Biomedical entity linking aims to map nonstandard entities to standard entities in a knowledge base. Traditional supervised methods perform well but require extensive annotated data to transfer, limiting their usage in low-resource scenarios. Large language models (LLMs), especially closed-source LLMs, can address these but risk stability issues and high economic costs: using these models is restricted by commercial companies and brings significant economic costs when dealing with large amounts of data. To address this, we propose ``RPDR'', a framework combining closed-source LLMs and open-source LLMs for re-ranking candidates retrieved by a retriever fine-tuned with a small amount of data. By prompting a closed-source LLM to generate training data from unannotated data and fine-tuning an open-source LLM for re-ranking, we effectively distill the knowledge to the open-source LLM that can be deployed locally, thus avoiding the stability issues and the problem of high economic costs. We evaluate RPDR on two datasets, including one real-world dataset and one publicly available dataset involving two languages: Chinese and English. RPDR achieves 0.019 Acc@1 improvement and 0.036 Acc@1 improvement on the Aier dataset and the Ask A Patient dataset when the amount of training data is not enough. The results demonstrate the superiority and generalizability of the proposed framework.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Remote Sensing Image Classification with Decoupled Knowledge Distillation
Authors:
Yaping He,
Jianfeng Cai,
Qicong Hu,
Peiqing Wang
Abstract:
To address the challenges posed by the large number of parameters in existing remote sensing image classification models, which hinder deployment on resource-constrained devices, this paper proposes a lightweight classification method based on knowledge distillation. Specifically, G-GhostNet is adopted as the backbone network, leveraging feature reuse to reduce redundant parameters and significant…
▽ More
To address the challenges posed by the large number of parameters in existing remote sensing image classification models, which hinder deployment on resource-constrained devices, this paper proposes a lightweight classification method based on knowledge distillation. Specifically, G-GhostNet is adopted as the backbone network, leveraging feature reuse to reduce redundant parameters and significantly improve inference efficiency. In addition, a decoupled knowledge distillation strategy is employed, which separates target and non-target classes to effectively enhance classification accuracy. Experimental results on the RSOD and AID datasets demonstrate that, compared with the high-parameter VGG-16 model, the proposed method achieves nearly equivalent Top-1 accuracy while reducing the number of parameters by 6.24 times. This approach strikes an excellent balance between model size and classification performance, offering an efficient solution for deployment on resource-limited devices.
△ Less
Submitted 9 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
An Ultra-Low Power and Fast Ising Machine using Voltage-Controlled Magnetoresistive Random Access Memory
Authors:
Yihao Zhang,
Sai Li,
Albert Lee,
Zheng Zhu,
Lang Zeng,
Peng Wang,
Lei Gao,
Di Wu,
Weisheng Zhao
Abstract:
Physics-inspired computing paradigms, such as Ising machines, are emerging as promising hardware alternatives to traditional von Neumann architectures for tackling computationally intensive combinatorial optimization problems (COPs). While quantum, optical, and electronic devices have garnered significant attention for their potential in realizing Ising machines, their translation into practical s…
▽ More
Physics-inspired computing paradigms, such as Ising machines, are emerging as promising hardware alternatives to traditional von Neumann architectures for tackling computationally intensive combinatorial optimization problems (COPs). While quantum, optical, and electronic devices have garnered significant attention for their potential in realizing Ising machines, their translation into practical systems for industry-relevant applications remains challenging, with each approach facing specific limitations in power consumption and speed. To address this challenge, we report the first chip-level spintronic Ising machine using voltage-controlled magnetoresistive random access memory. The core of our design leverages magnetic tunnel junctions (MTJs) driven by the voltage-controlled magnetic anisotropy effect to realize the probabilistic update of Ising spins through a new mechanism. It enables a latency below 1 ns and an energy consumption under 40 fJ per spin update, achieving a 1000-times improvement over previous current-driven MTJ-based implementations. We map two real-world COPs in electronic design automation-global routing and layer assignment-onto the Ising model and demonstrate high-quality results with an energy efficiency of 25000 solutions per second per watt. This outperforms state-of-the-art quantum and graphics processing units by six and seven orders of magnitude, respectively. These results establish voltage-controlled spintronics as a compelling route towards next-generation physics-inspired machine intelligence, offering a paradigm for ultra-low-power, high-speed, and scalable computation.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Towards Harmonized Uncertainty Estimation for Large Language Models
Authors:
Rui Li,
Jing Long,
Muge Qi,
Heming Xia,
Lei Sha,
Peiyi Wang,
Zhifang Sui
Abstract:
To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to st…
▽ More
To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM's performance to adjust uncertainty scores. Comprehensive experiments across diverse models and tasks demonstrate its effectiveness, which achieves consistent improvements of up to 60% over existing methods.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
A Derivative-Free Position Optimization Approach for Movable Antenna Multi-User Communication Systems
Authors:
Xianlong Zeng,
Jun Fang,
Peilan Wang,
Weidong Mei,
Ying-Chang Liang
Abstract:
Movable antennas (MAs) have emerged as a disruptive technology in wireless communications for enhancing spatial degrees of freedom through continuous antenna repositioning within predefined regions, thereby creating favorable channel propagation conditions. In this paper, we study the problem of position optimization for MA-enabled multi-user MISO systems, where a base station (BS), equipped with…
▽ More
Movable antennas (MAs) have emerged as a disruptive technology in wireless communications for enhancing spatial degrees of freedom through continuous antenna repositioning within predefined regions, thereby creating favorable channel propagation conditions. In this paper, we study the problem of position optimization for MA-enabled multi-user MISO systems, where a base station (BS), equipped with multiple MAs, communicates with multiple users each equipped with a single fixed-position antenna (FPA). To circumvent the difficulty of acquiring the channel state information (CSI) from the transmitter to the receiver over the entire movable region, we propose a derivative-free approach for MA position optimization. The basic idea is to treat position optimization as a closed-box optimization problem and calculate the gradient of the unknown objective function using zeroth-order (ZO) gradient approximation techniques. Specifically, the proposed method does not need to explicitly estimate the global CSI. Instead, it adaptively refines its next movement based on previous measurements such that it eventually converges to an optimum or stationary solution. Simulation results show that the proposed derivative-free approach is able to achieve higher sample and computational efficiencies than the CSI estimation-based position optimization approach, particularly for challenging scenarios where the number of multi-path components (MPCs) is large or the number of pilot signals is limited.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Inference Compute-Optimal Video Vision Language Models
Authors:
Peiqi Wang,
ShengYun Peng,
Xuewen Zhang,
Hanchao Yu,
Yibo Yang,
Lifu Huang,
Fujun Liu,
Qifan Wang
Abstract:
This work investigates the optimal allocation of inference compute across three key scaling factors in video vision language models: language model size, frame count, and the number of visual tokens per frame. While prior works typically focuses on optimizing model efficiency or improving performance without considering resource constraints, we instead identify optimal model configuration under fi…
▽ More
This work investigates the optimal allocation of inference compute across three key scaling factors in video vision language models: language model size, frame count, and the number of visual tokens per frame. While prior works typically focuses on optimizing model efficiency or improving performance without considering resource constraints, we instead identify optimal model configuration under fixed inference compute budgets. We conduct large-scale training sweeps and careful parametric modeling of task performance to identify the inference compute-optimal frontier. Our experiments reveal how task performance depends on scaling factors and finetuning data size, as well as how changes in data size shift the compute-optimal frontier. These findings translate to practical tips for selecting these scaling factors.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
MRGAgents: A Multi-Agent Framework for Improved Medical Report Generation with Med-LVLMs
Authors:
Pengyu Wang,
Shuchang Ye,
Usman Naseem,
Jinman Kim
Abstract:
Medical Large Vision-Language Models (Med-LVLMs) have been widely adopted for medical report generation. Despite Med-LVLMs producing state-of-the-art performance, they exhibit a bias toward predicting all findings as normal, leading to reports that overlook critical abnormalities. Furthermore, these models often fail to provide comprehensive descriptions of radiologically relevant regions necessar…
▽ More
Medical Large Vision-Language Models (Med-LVLMs) have been widely adopted for medical report generation. Despite Med-LVLMs producing state-of-the-art performance, they exhibit a bias toward predicting all findings as normal, leading to reports that overlook critical abnormalities. Furthermore, these models often fail to provide comprehensive descriptions of radiologically relevant regions necessary for accurate diagnosis. To address these challenges, we proposeMedical Report Generation Agents (MRGAgents), a novel multi-agent framework that fine-tunes specialized agents for different disease categories. By curating subsets of the IU X-ray and MIMIC-CXR datasets to train disease-specific agents, MRGAgents generates reports that more effectively balance normal and abnormal findings while ensuring a comprehensive description of clinically relevant regions. Our experiments demonstrate that MRGAgents outperformed the state-of-the-art, improving both report comprehensiveness and diagnostic utility.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Single-agent or Multi-agent Systems? Why Not Both?
Authors:
Mingyan Gao,
Yanzi Li,
Banruo Liu,
Yifan Yu,
Phillip Wang,
Ching-Yu Lin,
Fan Lai
Abstract:
Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost co…
▽ More
Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS across diverse domains, enabled by long-horizon context tracking and error correction through role-specific agents. However, the design and deployment of MAS incur higher complexity and runtime cost compared to single-agent systems (SAS). Meanwhile, frontier LLMs, such as OpenAI-o3 and Gemini-2.5-Pro, have rapidly advanced in long-context reasoning, memory retention, and tool usage, mitigating many limitations that originally motivated MAS designs. In this paper, we conduct an extensive empirical study comparing MAS and SAS across various popular agentic applications. We find that the benefits of MAS over SAS diminish as LLM capabilities improve, and we propose efficient mechanisms to pinpoint the error-prone agent in MAS. Furthermore, the performance discrepancy between MAS and SAS motivates our design of a hybrid agentic paradigm, request cascading between MAS and SAS, to improve both efficiency and capability. Our design improves accuracy by 1.1-12% while reducing deployment costs by up to 20% across various agentic applications.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Measurement of branching fractions of $Λ_{c}^{+}$ decays to $Σ^{+} η$ and $Σ^{+} η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (644 additional authors not shown)
Abstract:
By analyzing $e^+e^-$ collision data taken at center-of-mass energies
$\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of…
▽ More
By analyzing $e^+e^-$ collision data taken at center-of-mass energies
$\sqrt{s} = 4.600 \sim 4.699$ $\mbox{GeV}$ with the BESIII detector at the BEPCII collider, corresponding to an integrated luminosity of $\rm 4.5~fb^{-1}$, we study the hadronic decays $Λ_{c}^{+} \rightarrow Σ^{+} η$ and $Λ_{c}^{+} \rightarrow Σ^{+} η^{\prime}$ using the single-tag method. The branching fraction ratio of $Λ_{c}^+ \rightarrow Σ^+ η$ relative to $Λ_{c}^+ \rightarrow Σ^+ π^0$ is determined to be $0.305 \pm 0.046_{\rm stat.} \pm 0.007_{\rm sys.}$, and that of $Λ_{c}^+ \rightarrow Σ^+ η'$ relative to $Λ_{c}^+ \rightarrow Σ^+ ω$ is $0.336 \pm 0.094_{\rm stat.} \pm 0.037_{\rm sys.}$. The ratio of $\frac{\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} η'\right)}{\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} η\right)} $ is determined to be $1.50\pm 0.48 \pm 0.17 \pm 0.21$, where the uncertainties are statistical, systematic, and from $\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} π^0\right) $ or $\mathcal{B}\left(Λ_{c}^{+} \rightarrow Σ^{+} ω\right) $, respectively. These results enrich our knowledge of charmed baryon decays.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models
Authors:
Xiang Liu,
Zhaoxiang Liu,
Peng Wang,
Kohou Wang,
Huan Hu,
Kai Wang,
Shiguo Lian
Abstract:
When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge,…
▽ More
When using supervised fine-tuning (SFT) to adapt large language models (LLMs) to specific domains, a significant challenge arises: should we use the entire SFT dataset for fine-tuning? Common practice often involves fine-tuning directly on the entire dataset due to limited information on the LLM's past training data. However, if the SFT dataset largely overlaps with the model's existing knowledge, the performance gains are minimal, leading to wasted computational resources. Identifying the unknown knowledge within the SFT dataset and using it to fine-tune the model could substantially improve the training efficiency. To address this challenge, we propose a self-learning framework for LLMs inspired by human learning pattern. This framework takes a fine-tuning (SFT) dataset in a specific domain as input. First, the LLMs answer the questions in the SFT dataset. The LLMs then objectively grade the responses and filter out the incorrectly answered QA pairs. Finally, we fine-tune the LLMs based on this filtered QA set. Experimental results in the fields of agriculture and medicine demonstrate that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning. By concentrating on the unknown knowledge within the SFT dataset, our approach enhances the efficiency of fine-tuning LLMs.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
GCAL: Adapting Graph Models to Evolving Domain Shifts
Authors:
Ziyue Qiao,
Qianyi Cai,
Hao Dong,
Jiawei Gu,
Pengyang Wang,
Meng Xiao,
Xiao Luo,
Hui Xiong
Abstract:
This paper addresses the challenge of graph domain adaptation on evolving, multiple out-of-distribution (OOD) graphs. Conventional graph domain adaptation methods are confined to single-step adaptation, making them ineffective in handling continuous domain shifts and prone to catastrophic forgetting. This paper introduces the Graph Continual Adaptive Learning (GCAL) method, designed to enhance mod…
▽ More
This paper addresses the challenge of graph domain adaptation on evolving, multiple out-of-distribution (OOD) graphs. Conventional graph domain adaptation methods are confined to single-step adaptation, making them ineffective in handling continuous domain shifts and prone to catastrophic forgetting. This paper introduces the Graph Continual Adaptive Learning (GCAL) method, designed to enhance model sustainability and adaptability across various graph domains. GCAL employs a bilevel optimization strategy. The "adapt" phase uses an information maximization approach to fine-tune the model with new graph domains while re-adapting past memories to mitigate forgetting. Concurrently, the "generate memory" phase, guided by a theoretical lower bound derived from information bottleneck theory, involves a variational memory graph generation module to condense original graphs into memories. Extensive experimental evaluations demonstrate that GCAL substantially outperforms existing methods in terms of adaptability and knowledge retention.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor
Authors:
Hai-Feng Zhang,
Zhao-Yun Chen,
Peng Wang,
Liang-Liang Guo,
Tian-Le Wang,
Xiao-Yan Yang,
Ren-Ze Zhao,
Ze-An Zhao,
Sheng Zhang,
Lei Du,
Hao-Ran Tao,
Zhi-Long Jia,
Wei-Cheng Kong,
Huan-Yu Liu,
Athanasios V. Vasilakos,
Yang Yang,
Yu-Chun Wu,
Ji Guan,
Peng Duan,
Guo-Ping Guo
Abstract:
Quantum machine learning (QML) models, like their classical counterparts, are vulnerable to adversarial attacks, hindering their secure deployment. Here, we report the first systematic experimental robustness benchmark for 20-qubit quantum neural network (QNN) classifiers executed on a superconducting processor. Our benchmarking framework features an efficient adversarial attack algorithm designed…
▽ More
Quantum machine learning (QML) models, like their classical counterparts, are vulnerable to adversarial attacks, hindering their secure deployment. Here, we report the first systematic experimental robustness benchmark for 20-qubit quantum neural network (QNN) classifiers executed on a superconducting processor. Our benchmarking framework features an efficient adversarial attack algorithm designed for QNNs, enabling quantitative characterization of adversarial robustness and robustness bounds. From our analysis, we verify that adversarial training reduces sensitivity to targeted perturbations by regularizing input gradients, significantly enhancing QNN's robustness. Additionally, our analysis reveals that QNNs exhibit superior adversarial robustness compared to classical neural networks, an advantage attributed to inherent quantum noise. Furthermore, the empirical upper bound extracted from our attack experiments shows a minimal deviation ($3 \times 10^{-3}$) from the theoretical lower bound, providing strong experimental confirmation of the attack's effectiveness and the tightness of fidelity-based robustness bounds. This work establishes a critical experimental framework for assessing and improving quantum adversarial robustness, paving the way for secure and reliable QML applications.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning
Authors:
Siqu Ou,
Hongcheng Liu,
Pingjie Wang,
Yusheng Liao,
Chuan Xuan,
Yanfeng Wang,
Yu Wang
Abstract:
While chains-of-thought (CoT) have advanced complex reasoning in multimodal large language models (MLLMs), existing methods remain confined to text or static visual domains, often faltering in dynamic spatial reasoning tasks. To bridge this gap, we present GRASSLAND, a novel maze navigation benchmark designed to evaluate dynamic spatial reasoning. Our experiments show that augmenting textual reaso…
▽ More
While chains-of-thought (CoT) have advanced complex reasoning in multimodal large language models (MLLMs), existing methods remain confined to text or static visual domains, often faltering in dynamic spatial reasoning tasks. To bridge this gap, we present GRASSLAND, a novel maze navigation benchmark designed to evaluate dynamic spatial reasoning. Our experiments show that augmenting textual reasoning chains with dynamic visual drafts, overlaid on input images, significantly outperforms conventional approaches, offering new insights into spatial reasoning in evolving environments. To generalize this capability, we propose D2R (Dynamic Draft-Augmented Reasoning), a training-free framework that seamlessly integrates textual CoT with corresponding visual drafts into MLLMs. Extensive evaluations demonstrate that D2R consistently enhances performance across diverse tasks, establishing a robust baseline for dynamic spatial reasoning without requiring model fine-tuning. Project is open at https://github.com/Cratileo/D2R.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Towards Realistic Detection Pipelines of Taiji: New Challenges in Data Analysis and High-Fidelity Simulations of Space-Borne Gravitational Wave Antenna
Authors:
Minghui Du,
Pengcheng Wang,
Ziren Luo,
Wen-Biao Han,
Xin Zhang,
Xian Chen,
Zhoujian Cao,
Xilong Fan,
He Wang,
Xiaodong Peng,
Li-E Qiang,
Ke An,
Yidi Fan,
Jiafeng Zhang,
Liang-Gui Zhu,
Ping Shen,
Qianyun Yun,
Xiao-Bo Zou,
Ye Jiang,
Tianyu Zhao,
Yong Yuan,
Xiaotong Wei,
Yuxiang Xu,
Bo Liang,
Peng Xu
, et al. (1 additional authors not shown)
Abstract:
Taiji, a Chinese space-based gravitational wave detection project, aims to explore the millihertz gravitational wave universe with unprecedented sensitivity, targeting astrophysical and cosmological sources including Galactic binaries, massive black hole binaries, extreme mass-ratio inspirals, and stochastic gravitational wave backgrounds, etc. These observations are expected to provide transforma…
▽ More
Taiji, a Chinese space-based gravitational wave detection project, aims to explore the millihertz gravitational wave universe with unprecedented sensitivity, targeting astrophysical and cosmological sources including Galactic binaries, massive black hole binaries, extreme mass-ratio inspirals, and stochastic gravitational wave backgrounds, etc. These observations are expected to provide transformative insights into astrophysics, cosmology, and fundamental physics. However, Taiji's data analysis faces unique challenges distinct from ground-based detectors like LIGO-Virgo-KAGRA, such as the overlap of numerous signals, extended data durations, more rigorous accuracy requirements for the waveform templates, non-negligible subdominant waveform complexities, incompletely characterized noise spectra, non-stationary noises, and various data anomalies. This paper presents the second round of Taiji Data Challenge, a collection of simulation datasets designed as a shared platform for resolving these critical data analysis problems. The current platform distinguishes from previous works by the systematic integration of orbital dynamics based on the full drag-free and attitude control simulation, extended noise sources, more sophisticated and overlapping gravitational wave signals, second-generation time-delay interferometry and the coupling effect of time-varying armlengths, etc. Concurrently released is the open-source toolkit Triangle (available at https://github.com/TriangleDataCenter), which offers the capabilities for customized simulation of signals, noises and other instrumental effects. By taking a step further towards realistic detection, Taiji Data Challenge II and Triangle altogether serve as a new testbed, supporting the development of Taiji's global analysis and end-to-end pipelines, and ultimately bridging the gaps between observation and scientific objectives.
△ Less
Submitted 23 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
A pulsar-helium star compact binary system formed by common envelope evolution
Authors:
Z. L. Yang,
J. L. Han,
D. J. Zhou,
W. C. Jing,
W. C. Chen,
T. Wang,
X. D. Li,
S. Wang,
B. Wang,
H. W. Ge,
Y. L. Guo,
L. H. Li,
Y. Shao,
J. F. Liu,
W. Q. Su,
L. G. Hou,
W. J. Huang,
J. C. Jiang,
P. Jiang,
J. H. Sun,
B. J. Wang,
C. Wang,
H. G. Wang,
J. B. Wang,
N. Wang
, et al. (11 additional authors not shown)
Abstract:
A stellar common envelope occurs in a binary system when the atmosphere of an evolving star expands to encompass an orbiting companion object. Such systems are predicted to evolve rapidly, ejecting the stellar envelope and leaving the companion in a tighter orbit around a stripped star. We used radio timing to identify a pulsar, PSR J1928+1815, with a spin period of 10.55 ms in a compact binary sy…
▽ More
A stellar common envelope occurs in a binary system when the atmosphere of an evolving star expands to encompass an orbiting companion object. Such systems are predicted to evolve rapidly, ejecting the stellar envelope and leaving the companion in a tighter orbit around a stripped star. We used radio timing to identify a pulsar, PSR J1928+1815, with a spin period of 10.55 ms in a compact binary system with an orbital period of 3.60 hours. The companion star has 1.0 to 1.6 solar masses, eclipses the pulsar for about 17% of the orbit, and is undetected at other wavelengths, so it is most likely a stripped helium star. We interpret this system as having recently undergone a common envelope phase, producing a compact binary.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing
Authors:
Peng Wang,
Biyu Zhou,
Xuehai Tang,
Jizhong Han,
Songlin Hu
Abstract:
Large Language Models often contain factually incorrect or outdated knowledge, giving rise to model editing methods for precise knowledge updates. However, current mainstream locate-then-edit approaches exhibit a progressive performance decline during sequential editing, due to inadequate mechanisms for long-term knowledge preservation. To tackle this, we model the sequential editing as a constrai…
▽ More
Large Language Models often contain factually incorrect or outdated knowledge, giving rise to model editing methods for precise knowledge updates. However, current mainstream locate-then-edit approaches exhibit a progressive performance decline during sequential editing, due to inadequate mechanisms for long-term knowledge preservation. To tackle this, we model the sequential editing as a constrained stochastic programming. Given the challenges posed by the cumulative preservation error constraint and the gradually revealed editing tasks, \textbf{LyapLock} is proposed. It integrates queuing theory and Lyapunov optimization to decompose the long-term constrained programming into tractable stepwise subproblems for efficient solving. This is the first model editing framework with rigorous theoretical guarantees, achieving asymptotic optimal editing performance while meeting the constraints of long-term knowledge preservation. Experimental results show that our framework scales sequential editing capacity to over 10,000 edits while stabilizing general capabilities and boosting average editing efficacy by 11.89\% over SOTA baselines. Furthermore, it can be leveraged to enhance the performance of baseline methods. Our code is released on https://github.com/caskcsg/LyapLock.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Observation of $χ_{cJ}\to 3K_S^0K^\pmπ^\mp$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (678 additional authors not shown)
Abstract:
By analyzing $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays $χ_{c0,1,2} \to 3K_S^0K^\pmπ^\mp$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to 3K_S^0K^\pmπ^\mp )=(7.95\pm0.50\pm0.65)\times10^{-5},$…
▽ More
By analyzing $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays $χ_{c0,1,2} \to 3K_S^0K^\pmπ^\mp$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to 3K_S^0K^\pmπ^\mp )=(7.95\pm0.50\pm0.65)\times10^{-5},$ $\mathcal{B}(χ_{c1}\to 3K_S^0K^\pmπ^\mp)=(2.62\pm0.08\pm0.19)\times10^{-4},$ and $\mathcal{B}(χ_{c2}\to 3K_S^0K^\pmπ^\mp)=(1.72\pm0.07\pm0.15)\times10^{-4},$ where the first uncertainties are statistical and the second systematic.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Authors:
Tencent Hunyuan Team,
Ao Liu,
Botong Zhou,
Can Xu,
Chayse Zhou,
ChenChen Zhang,
Chengcheng Xu,
Chenhao Wang,
Decheng Wu,
Dengpeng Wu,
Dian Jiao,
Dong Du,
Dong Wang,
Feng Zhang,
Fengzong Lian,
Guanghui Xu,
Guanwei Zhang,
Hai Wang,
Haipeng Luo,
Han Hu,
Huilin Xu,
Jiajia Wu,
Jianchen Zhu,
Jianfeng Yan,
Jiaqi Zhu
, et al. (230 additional authors not shown)
Abstract:
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response…
▽ More
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.
△ Less
Submitted 22 May, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System
Authors:
Peng Wang,
Ruihan Tao,
Qiguang Chen,
Mengkang Hu,
Libo Qin
Abstract:
Recently, large language model (LLM)-based agents have achieved significant success in interactive environments, attracting significant academic and industrial attention. Despite these advancements, current research predominantly focuses on English scenarios. In reality, there are over 7,000 languages worldwide, all of which demand access to comparable agentic services. Nevertheless, the developme…
▽ More
Recently, large language model (LLM)-based agents have achieved significant success in interactive environments, attracting significant academic and industrial attention. Despite these advancements, current research predominantly focuses on English scenarios. In reality, there are over 7,000 languages worldwide, all of which demand access to comparable agentic services. Nevertheless, the development of language agents remains inadequate for meeting the diverse requirements of multilingual agentic applications. To fill this gap, we introduce X-WebAgentBench, a novel multilingual agent benchmark in an interactive web environment, which evaluates the planning and interaction performance of language agents across multiple languages, thereby contributing to the advancement of global agent intelligence. Additionally, we assess the performance of various LLMs and cross-lingual alignment methods, examining their effectiveness in enhancing agents. Our findings reveal that even advanced models like GPT-4o, when combined with cross-lingual techniques, fail to achieve satisfactory results. We hope that X-WebAgentBench can serve as a valuable benchmark for multilingual agent scenario in real-world applications.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Policy Testing in Markov Decision Processes
Authors:
Kaito Ariu,
Po-An Wang,
Alexandre Proutiere,
Kenshi Abe
Abstract:
We study the policy testing problem in discounted Markov decision processes (MDPs) under the fixed-confidence setting. The goal is to determine whether the value of a given policy exceeds a specified threshold while minimizing the number of observations. We begin by deriving an instance-specific lower bound that any algorithm must satisfy. This lower bound is characterized as the solution to an op…
▽ More
We study the policy testing problem in discounted Markov decision processes (MDPs) under the fixed-confidence setting. The goal is to determine whether the value of a given policy exceeds a specified threshold while minimizing the number of observations. We begin by deriving an instance-specific lower bound that any algorithm must satisfy. This lower bound is characterized as the solution to an optimization problem with non-convex constraints. We propose a policy testing algorithm inspired by this optimization problem--a common approach in pure exploration problems such as best-arm identification, where asymptotically optimal algorithms often stem from such optimization-based characterizations. As for other pure exploration tasks in MDPs, however, the non-convex constraints in the lower-bound problem present significant challenges, raising doubts about whether statistically optimal and computationally tractable algorithms can be designed. To address this, we reformulate the lower-bound problem by interchanging the roles of the objective and the constraints, yielding an alternative problem with a non-convex objective but convex constraints. Strikingly, this reformulated problem admits an interpretation as a policy optimization task in a newly constructed reversed MDP. Leveraging recent advances in policy gradient methods, we efficiently solve this problem and use it to design a policy testing algorithm that is statistically optimal--matching the instance-specific lower bound on sample complexity--while remaining computationally tractable. We validate our approach with numerical experiments.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
Authors:
Jie Ma,
Ning Qu,
Zhitao Gao,
Rui Xing,
Jun Liu,
Hongbin Pei,
Jiang Xie,
Linyun Song,
Pinghui Wang,
Jing Tao,
Zhou Su
Abstract:
Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLM…
▽ More
Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLMs' reasoning, while the latter can improve the reliability of response generation. Motivated by these, we propose a trustworthy reasoning framework, termed Deliberation over Priors (DP), which sufficiently utilizes the priors contained in KGs. Specifically, DP adopts a progressive knowledge distillation strategy that integrates structural priors into LLMs through a combination of supervised fine-tuning and Kahneman-Tversky optimization, thereby improving the faithfulness of relation path generation. Furthermore, our framework employs a reasoning-introspection strategy, which guides LLMs to perform refined reasoning verification based on extracted constraint priors, ensuring the reliability of response generation. Extensive experiments on three benchmark datasets demonstrate that DP achieves new state-of-the-art performance, especially a Hit@1 improvement of 13% on the ComplexWebQuestions dataset, and generates highly trustworthy responses. We also conduct various analyses to verify its flexibility and practicality. The code is available at https://github.com/reml-group/Deliberation-on-Priors.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Privacy-Preserving Socialized Recommendation based on Multi-View Clustering in a Cloud Environment
Authors:
Cheng Guo,
Jing Jia,
Peng Wang,
Jing Zhang
Abstract:
Recommendation as a service has improved the quality of our lives and plays a significant role in variant aspects. However, the preference of users may reveal some sensitive information, so that the protection of privacy is required. In this paper, we propose a privacy-preserving, socialized, recommendation protocol that introduces information collected from online social networks to enhance the q…
▽ More
Recommendation as a service has improved the quality of our lives and plays a significant role in variant aspects. However, the preference of users may reveal some sensitive information, so that the protection of privacy is required. In this paper, we propose a privacy-preserving, socialized, recommendation protocol that introduces information collected from online social networks to enhance the quality of the recommendation. The proposed scheme can calculate the similarity between users to determine their potential relationships and interests, and it also can protect the users' privacy from leaking to an untrusted third party. The security analysis and experimental results showed that our proposed scheme provides excellent performance and is feasible for real-world applications.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Photonic chip-based high-efficiency soliton microcombs via electroopitc-Kerr synergy
Authors:
Rui Niu,
Shuai Wan,
Pi-Yu Wang,
Rui Ma,
Jin Li,
Fang Bo,
Zhen Shen,
Guang-Can Guo,
Fang-Wen Sun,
Junqiu Liu,
Chun-Hua Dong
Abstract:
Temporal soliton mode-locking in coherently pumped microcavities provides a promising platform for miniaturized frequency comb systems. While significant progress has been made, achieving high conversion efficiency in such microcombs remains a critical challenge. Soliton generation through pulse pumping has emerged as an effective strategy to improve conversion efficiency. However, the on-chip int…
▽ More
Temporal soliton mode-locking in coherently pumped microcavities provides a promising platform for miniaturized frequency comb systems. While significant progress has been made, achieving high conversion efficiency in such microcombs remains a critical challenge. Soliton generation through pulse pumping has emerged as an effective strategy to improve conversion efficiency. However, the on-chip integration of pulse generation with dissipative Kerr soliton (DKS) formation within the photonic chip has not yet been realized. In this work, we demonstrate a photonic chip-based soliton microcomb with high conversion efficiency, achieved by integrating on-chip pulse generation and DKS generation. The pulsed laser, fabricated on a lithium niobate-on-insulator (LNOI) platform, delivers a 35.5GHz repetition rate with broadly tunable center frequencies. By coupling these on-chip pulses to a silicon nitride microresonator, we achieve stable DKS generation with a pump-to-soliton conversion efficiency of 43.9% under steady-state conditions. This integrated architecture establishes a viable pathway toward chip-scale soliton microcombs with unprecedented efficiency, opening up new possibilities for optical communications, precision spectroscopy, and photonic sensing.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Test of local realism via entangled $Λ\barΛ$ system
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (597 additional authors not shown)
Abstract:
The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However…
▽ More
The non-locality of quantum correlations is a fundamental feature of quantum theory. The Bell inequality serves as a benchmark for distinguishing between predictions made by quantum theory and local hidden variable theory (LHVT). Recent advancements in photon-entanglement experiments have addressed potential loopholes and have observed significant violations of variants of Bell inequality. However, examples of Bell inequalities violation in high energy physics are scarce. In this study, we utilize $(10.087\pm0.044)\times10^{9}$ $J/ψ$ events collected with the BES-III detector at the BEPCII collider, performing non-local correlation tests using the entangled hyperon pairs. The massive-entangled $Λ\barΛ$ systems are formed and decay through strong and weak interactions, respectively. Through measurements of the angular distribution of $p\bar{p}$ in $J/ψ\to γη_c$ and subsequent $η_c\toΛ(pπ^-)\barΛ(\bar{p}π^{+})$ cascade decays, a significant violation of LHVT predictions is observed. The exclusion of LHVT is found to be statistically significant at a level exceeding $5.2σ$ in the testing of three Bell-like inequalities.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
G. H. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (292 additional authors not shown)
Abstract:
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst…
▽ More
We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Disentangled Multi-span Evolutionary Network against Temporal Knowledge Graph Reasoning
Authors:
Hao Dong,
Ziyue Qiao,
Zhiyuan Ning,
Qi Hao,
Yi Du,
Pengyang Wang,
Yuanchun Zhou
Abstract:
Temporal Knowledge Graphs (TKGs), as an extension of static Knowledge Graphs (KGs), incorporate the temporal feature to express the transience of knowledge by describing when facts occur. TKG extrapolation aims to infer possible future facts based on known history, which has garnered significant attention in recent years. Some existing methods treat TKG as a sequence of independent subgraphs to mo…
▽ More
Temporal Knowledge Graphs (TKGs), as an extension of static Knowledge Graphs (KGs), incorporate the temporal feature to express the transience of knowledge by describing when facts occur. TKG extrapolation aims to infer possible future facts based on known history, which has garnered significant attention in recent years. Some existing methods treat TKG as a sequence of independent subgraphs to model temporal evolution patterns, demonstrating impressive reasoning performance. However, they still have limitations: 1) In modeling subgraph semantic evolution, they usually neglect the internal structural interactions between subgraphs, which are actually crucial for encoding TKGs. 2) They overlook the potential smooth features that do not lead to semantic changes, which should be distinguished from the semantic evolution process. Therefore, we propose a novel Disentangled Multi-span Evolutionary Network (DiMNet) for TKG reasoning. Specifically, we design a multi-span evolution strategy that captures local neighbor features while perceiving historical neighbor semantic information, thus enabling internal interactions between subgraphs during the evolution process. To maximize the capture of semantic change patterns, we design a disentangle component that adaptively separates nodes' active and stable features, used to dynamically control the influence of historical semantics on future evolution. Extensive experiments conducted on four real-world TKG datasets show that DiMNet demonstrates substantial performance in TKG reasoning, and outperforms the state-of-the-art up to 22.7% in MRR.
△ Less
Submitted 29 May, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache
Authors:
Pu Wang,
Pengwen Dai,
Chen Wu,
Yeying Jin,
Dianjie Lu,
Guijuan Zhang,
Youshan Zhang,
Zhuoran Zheng
Abstract:
In this paper, we propose an efficient visual transformer framework for ultra-high-definition (UHD) image dehazing that addresses the key challenges of slow training speed and high memory consumption for existing methods. Our approach introduces two key innovations: 1) an \textbf{a}daptive \textbf{n}ormalization mechanism inspired by the nGPT architecture that enables ultra-fast and stable trainin…
▽ More
In this paper, we propose an efficient visual transformer framework for ultra-high-definition (UHD) image dehazing that addresses the key challenges of slow training speed and high memory consumption for existing methods. Our approach introduces two key innovations: 1) an \textbf{a}daptive \textbf{n}ormalization mechanism inspired by the nGPT architecture that enables ultra-fast and stable training with a network with a restricted range of parameter expressions; and 2) we devise an atmospheric scattering-aware KV caching mechanism that dynamically optimizes feature preservation based on the physical haze formation model. The proposed architecture improves the training convergence speed by \textbf{5 $\times$} while reducing memory overhead, enabling real-time processing of 50 high-resolution images per second on an RTX4090 GPU. Experimental results show that our approach maintains state-of-the-art dehazing quality while significantly improving computational efficiency for 4K/8K image restoration tasks. Furthermore, we provide a new dehazing image interpretable method with the help of an integrated gradient attribution map. Our code can be found here: https://anonymous.4open.science/r/anDehazeFormer-632E/README.md.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Demonstrating Coherent Quantum Routers for Bucket-Brigade Quantum Random Access Memory on a Superconducting Processor
Authors:
Sheng Zhang,
Yun-Jie Wang,
Peng Wang,
Ren-Ze Zhao,
Xiao-Yan Yang,
Ze-An Zhao,
Tian-Le Wang,
Hai-Feng Zhang,
Zhi-Fei Li,
Yuan Wu,
Hao-Ran Tao,
Liang-Liang Guo,
Lei Du,
Chi Zhang,
Zhi-Long Jia,
Wei-Cheng Kong,
Zhuo-Zhi Zhang,
Xiang-Xiang Song,
Yu-Chun Wu,
Zhao-Yun Chen,
Peng Duan,
Guo-Ping Guo
Abstract:
Quantum routers (QRouters) are essential components of bucket-brigade quantum random access memory (QRAM), enabling quantum applications such as Grover's search and quantum machine learning. Despite significant theoretical advances, achieving scalable and coherent QRouters experimentally remains challenging. Here, we demonstrate coherent quantum routers using a superconducting quantum processor, l…
▽ More
Quantum routers (QRouters) are essential components of bucket-brigade quantum random access memory (QRAM), enabling quantum applications such as Grover's search and quantum machine learning. Despite significant theoretical advances, achieving scalable and coherent QRouters experimentally remains challenging. Here, we demonstrate coherent quantum routers using a superconducting quantum processor, laying a practical foundation for scalable QRAM systems. The quantum router at the core of our implementation utilizes the transition composite gate (TCG) scheme, wherein auxiliary energy levels temporarily mediate conditional interactions, substantially reducing circuit depth compared to traditional gate decompositions. Moreover, by encoding routing addresses in the non-adjacent qutrit states $|0\rangle$ and $|2\rangle$, our design inherently enables eraser-detection capability, providing efficient post-selection to mitigate routing errors. Experimentally, we achieve individual QRouter fidelities up to 95.74%, and validate scalability through a two-layer quantum routing network achieving an average fidelity of 82.40%. Our results represent a significant advancement in quantum routing technology, providing enhanced fidelity, built-in error resilience, and practical scalability crucial for the development of future QRAM and large-scale quantum computing architectures.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
DD-Ranking: Rethinking the Evaluation of Dataset Distillation
Authors:
Zekai Li,
Xinhao Zhong,
Samir Khaki,
Zhiyuan Liang,
Yuhao Zhou,
Mingjia Shi,
Ziqiao Wang,
Xuanlei Zhao,
Wangbo Zhao,
Ziheng Qin,
Mengxuan Wu,
Pengfei Zhou,
Haonan Wang,
David Junhao Zhang,
Jia-Wei Liu,
Shaobo Wang,
Dai Liu,
Linfeng Zhang,
Guang Li,
Kun Wang,
Zheng Zhu,
Zhiheng Ma,
Joey Tianyi Zhou,
Jiancheng Lv,
Yaochu Jin
, et al. (27 additional authors not shown)
Abstract:
In recent years, dataset distillation has provided a reliable solution for data compression, where models trained on the resulting smaller synthetic datasets achieve performance comparable to those trained on the original datasets. To further improve the performance of synthetic datasets, various training pipelines and optimization objectives have been proposed, greatly advancing the field of data…
▽ More
In recent years, dataset distillation has provided a reliable solution for data compression, where models trained on the resulting smaller synthetic datasets achieve performance comparable to those trained on the original datasets. To further improve the performance of synthetic datasets, various training pipelines and optimization objectives have been proposed, greatly advancing the field of dataset distillation. Recent decoupled dataset distillation methods introduce soft labels and stronger data augmentation during the post-evaluation phase and scale dataset distillation up to larger datasets (e.g., ImageNet-1K). However, this raises a question: Is accuracy still a reliable metric to fairly evaluate dataset distillation methods? Our empirical findings suggest that the performance improvements of these methods often stem from additional techniques rather than the inherent quality of the images themselves, with even randomly sampled images achieving superior results. Such misaligned evaluation settings severely hinder the development of DD. Therefore, we propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods. By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.
△ Less
Submitted 21 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Partial Wave Analysis of $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$ and Cross Section Measurement of $e^{+}e^{-} \rightarrow π^{\pm}Z_{c}(3900)^{\mp}$ from 4.1271 to 4.3583 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$,…
▽ More
Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$, $f_{0}(980)(\rightarrowπ^{+}π^{-})J/ψ$, and $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ are measured for the first time. The mass and width of the $Z_{c}(3900)^{\pm}$ are determined to be $3884.6\pm0.7\pm3.3$ MeV/$c^{2}$ and $37.2\pm1.3\pm6.6$ MeV, respectively. The first errors are statistical and the second systematic. The final state $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ dominates the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. By analyzing the cross sections of $π^{\pm}Z_{c}(3900)^{\mp}$ and $f_{0}(980)J/ψ$, $Y(4220)$ has been observed. Its mass and width are determined to be $4225.8\pm4.2\pm3.1$ MeV/$c^{2}$ and $55.3\pm9.5\pm11.1$ MeV, respectively.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data
Authors:
Ping Xu,
Zhiyuan Ning,
Pengjiang Li,
Wenhao Liu,
Pengyang Wang,
Jiaxu Cui,
Yuanchun Zhou,
Pengfei Wang
Abstract:
Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity, with cell clustering playing a key role in identifying cell types and marker genes. Recent advances, especially graph neural networks (GNNs)-based methods, have significantly improved clustering performance. However, the analysis of scRNA-seq data remains challenging due to noise, sparsity, and high dimensionality. Compounding thes…
▽ More
Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity, with cell clustering playing a key role in identifying cell types and marker genes. Recent advances, especially graph neural networks (GNNs)-based methods, have significantly improved clustering performance. However, the analysis of scRNA-seq data remains challenging due to noise, sparsity, and high dimensionality. Compounding these challenges, GNNs often suffer from over-smoothing, limiting their ability to capture complex biological information. In response, we propose scSiameseClu, a novel Siamese Clustering framework for interpreting single-cell RNA-seq data, comprising of 3 key steps: (1) Dual Augmentation Module, which applies biologically informed perturbations to the gene expression matrix and cell graph relationships to enhance representation robustness; (2) Siamese Fusion Module, which combines cross-correlation refinement and adaptive information fusion to capture complex cellular relationships while mitigating over-smoothing; and (3) Optimal Transport Clustering, which utilizes Sinkhorn distance to efficiently align cluster assignments with predefined proportions while maintaining balance. Comprehensive evaluations on seven real-world datasets demonstrate that~\methodname~outperforms state-of-the-art methods in single-cell clustering, cell type annotation, and cell type classification, providing a powerful tool for scRNA-seq data interpretation.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Towards Reliable and Interpretable Traffic Crash Pattern Prediction and Safety Interventions Using Customized Large Language Models
Authors:
Yang Zhao,
Pu Wang,
Yibo Zhao,
Hongru Du,
Hao Frank Yang
Abstract:
Predicting crash events is crucial for understanding crash distributions and their contributing factors, thereby enabling the design of proactive traffic safety policy interventions. However, existing methods struggle to interpret the complex interplay among various sources of traffic crash data, including numeric characteristics, textual reports, crash imagery, environmental conditions, and drive…
▽ More
Predicting crash events is crucial for understanding crash distributions and their contributing factors, thereby enabling the design of proactive traffic safety policy interventions. However, existing methods struggle to interpret the complex interplay among various sources of traffic crash data, including numeric characteristics, textual reports, crash imagery, environmental conditions, and driver behavior records. As a result, they often fail to capture the rich semantic information and intricate interrelationships embedded in these diverse data sources, limiting their ability to identify critical crash risk factors. In this research, we propose TrafficSafe, a framework that adapts LLMs to reframe crash prediction and feature attribution as text-based reasoning. A multi-modal crash dataset including 58,903 real-world reports together with belonged infrastructure, environmental, driver, and vehicle information is collected and textualized into TrafficSafe Event Dataset. By customizing and fine-tuning LLMs on this dataset, the TrafficSafe LLM achieves a 42% average improvement in F1-score over baselines. To interpret these predictions and uncover contributing factors, we introduce TrafficSafe Attribution, a sentence-level feature attribution framework enabling conditional risk analysis. Findings show that alcohol-impaired driving is the leading factor in severe crashes, with aggressive and impairment-related behaviors having nearly twice the contribution for severe crashes compared to other driver behaviors. Furthermore, TrafficSafe Attribution highlights pivotal features during model training, guiding strategic crash data collection for iterative performance improvements. The proposed TrafficSafe offers a transformative leap in traffic safety research, providing a blueprint for translating advanced AI technologies into responsible, actionable, and life-saving outcomes.
△ Less
Submitted 21 May, 2025; v1 submitted 18 May, 2025;
originally announced May 2025.
-
Observation of $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (678 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are deter…
▽ More
Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are determined to be $({5.75 \pm 0.59 \pm 0.42}) \times 10^{-5}$, $({1.40 \pm 0.33 \pm 0.17}) \times 10^{-5}$, and $({2.64 \pm 0.40 \pm 0.27}) \times 10^{-5}$, respectively, where the first uncertainties are statistical and the second systematic. No evident resonant structures are found in the $p\bar{p}$ and $pη/\bar{p}η$ systems.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Observation of an Altered $a_{0}(980)$ Line-shape in $D^{+} \rightarrow π^{+}ηη$ due to the Triangle Loop Rescattering Effect
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (705 additional authors not shown)
Abstract:
Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be…
▽ More
Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be $(3.67\pm0.12_{\mathrm{stat.}}\pm 0.06_{\mathrm{syst.}})\times 10^{-3}$. Unlike the $a_{0}(980)$ line-shape observed in the decays of charmed mesons to $a_{0}(980)π$ and in the decay $D^{0} \to a_{0}(980)^{-}e^{+}ν_{e}$, where the low-mass side of the $a_0(980)$ is wider than the high-mass side, the $a_{0}(980)$ line-shape in $D^{+} \to a_{0}(980)^{+}η$ is found to be significantly altered, with the high-mass side being wider than the low-mass side. We establish that the $a_0(980)$ line-shape arises from the triangle loop rescattering of $D^+ \to \bar{K}_0^*(1430)^0K^+ \to a_0(980)^+ η$ and $D^+ \to K_0^*(1430)^+\bar{K}^0 \to a_0(980)^+ η$ with a significance of 5.8$σ$. This is the first experimental confirmation of the triangle loop rescattering effect.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
PRIME: Physics-Related Intelligent Mixture of Experts for Transistor Characteristics Prediction
Authors:
Zhenxing Dou,
Yijiao Wang,
Tao Zou,
Zhiwei Chen,
Fei Liu,
Peng Wang,
Weisheng Zhao
Abstract:
In recent years, machine learning has been extensively applied to data prediction during process ramp-up, with a particular focus on transistor characteristics for circuit design and manufacture. However, capturing the nonlinear current response across multiple operating regions remains a challenge for neural networks. To address such challenge, a novel machine learning framework, PRIME (Physics-R…
▽ More
In recent years, machine learning has been extensively applied to data prediction during process ramp-up, with a particular focus on transistor characteristics for circuit design and manufacture. However, capturing the nonlinear current response across multiple operating regions remains a challenge for neural networks. To address such challenge, a novel machine learning framework, PRIME (Physics-Related Intelligent Mixture of Experts), is proposed to capture and integrate complex regional characteristics. In essence, our framework incorporates physics-based knowledge with data-driven intelligence. By leveraging a dynamic weighting mechanism in its gating network, PRIME adaptively activates the suitable expert model based on distinct input data features. Extensive evaluations are conducted on various gate-all-around (GAA) structures to examine the effectiveness of PRIME and considerable improvements (60\%-84\%) in prediction accuracy are shown over state-of-the-art models.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Authors:
Pengfei Wang,
Guohai Xu,
Weinong Wang,
Junjie Yang,
Jie Lou,
Yunhua Xue
Abstract:
Recent advancements have enhanced the capability of Multimodal Large Language Models (MLLMs) to comprehend multi-image information. However, existing benchmarks primarily evaluate answer correctness, overlooking whether models genuinely comprehend the visual input. To address this, we define implicit visual misunderstanding (IVM), where MLLMs provide correct answers without fully comprehending the…
▽ More
Recent advancements have enhanced the capability of Multimodal Large Language Models (MLLMs) to comprehend multi-image information. However, existing benchmarks primarily evaluate answer correctness, overlooking whether models genuinely comprehend the visual input. To address this, we define implicit visual misunderstanding (IVM), where MLLMs provide correct answers without fully comprehending the visual input. Through our analysis, we decouple the visual and textual modalities within the causal attention module, revealing that attention distribution increasingly converges on the image associated with the correct answer as the network layers deepen. This insight leads to the introduction of a scale-agnostic metric, \textit{attention accuracy}, and a novel benchmark for quantifying IVMs. Attention accuracy directly evaluates the model's visual understanding via internal mechanisms, remaining robust to positional biases for more reliable assessments. Furthermore, we extend our approach to finer granularities and demonstrate its effectiveness in unimodal scenarios, underscoring its versatility and generalizability.
△ Less
Submitted 23 May, 2025; v1 submitted 15 May, 2025;
originally announced May 2025.
-
UAV-Enabled Passive 6DMA for ISAC: Joint Location, Orientation, and Reflection Optimization
Authors:
Peilan Wang,
Yu Xue,
Weidong Mei,
Jun Fang,
Rui Zhang
Abstract:
Improving the fundamental performance trade-off in integrated sensing and communication (ISAC) systems has been deemed as one of the most significant challenges. To address it, we propose in this letter a novel ISAC system that leverages an unmanned aerial vehicle (UAV)-mounted intelligent reflecting surface (IRS) and the UAV's maneuverability in six-dimensional (6D) space, i.e., three-dimensional…
▽ More
Improving the fundamental performance trade-off in integrated sensing and communication (ISAC) systems has been deemed as one of the most significant challenges. To address it, we propose in this letter a novel ISAC system that leverages an unmanned aerial vehicle (UAV)-mounted intelligent reflecting surface (IRS) and the UAV's maneuverability in six-dimensional (6D) space, i.e., three-dimensional (3D) location and 3D rotation, thus referred to as passive 6D movable antenna (6DMA). We aim to maximize the signal-to-noise ratio (SNR) for sensing a single target while ensuring a minimum SNR at a communication user equipment (UE), by jointly optimizing the transmit beamforming at the ISAC base station (BS), the 3D location and orientation as well as the reflection coefficients of the IRS. To solve this challenging non-convex optimization problem, we propose a two-stage approach. In the first stage, we aim to optimize the IRS's 3D location, 3D orientation, and reflection coefficients to enhance both the channel correlations and power gains for sensing and communication. Given their optimized parameters, the optimal transmit beamforming at the ISAC BS is derived in closed form. Simulation results demonstrate that the proposed passive 6DMA-enabled ISAC system significantly improves the sensing and communication trade-off by simultaneously enhancing channel correlations and power gains, and outperforms other baseline schemes.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Qwen3 Technical Report
Authors:
An Yang,
Anfeng Li,
Baosong Yang,
Beichen Zhang,
Binyuan Hui,
Bo Zheng,
Bowen Yu,
Chang Gao,
Chengen Huang,
Chenxu Lv,
Chujie Zheng,
Dayiheng Liu,
Fan Zhou,
Fei Huang,
Feng Hu,
Hao Ge,
Haoran Wei,
Huan Lin,
Jialong Tang,
Jian Yang,
Jianhong Tu,
Jianwei Zhang,
Jianxin Yang,
Jiaxi Yang,
Jing Zhou
, et al. (35 additional authors not shown)
Abstract:
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration…
▽ More
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Constraints On New Theories Using Rivet : CONTUR version 3 release note
Authors:
Andy Buckley,
Jon Butterworth,
Joseph Egan,
Christian Gutschow,
Sihyun Jeon,
Martin Habedank,
Tomasz Procter,
Peng Wang,
Yoran Yeh,
Luzhan Yue
Abstract:
The CONTUR toolkit exploits RIVET and its library of more than a thousand energy-frontier differential cross-section measurements from the Large Hadron Collider to allow rapid limit-setting and consistency checks for new physics models. In this note we summarise the main changes in the new CONTUR 3 major release series. These include additional statistical treatments, efficiency improvements, new…
▽ More
The CONTUR toolkit exploits RIVET and its library of more than a thousand energy-frontier differential cross-section measurements from the Large Hadron Collider to allow rapid limit-setting and consistency checks for new physics models. In this note we summarise the main changes in the new CONTUR 3 major release series. These include additional statistical treatments, efficiency improvements, new plotting utilities and many new measurements and Standard Model predictions.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
The Kramers-Fokker-Planck equation with a decaying potential in $\mathbb R^n$, $n \ge 4$
Authors:
Xinghong Pan,
Xue Ping Wang,
Lu Zhu
Abstract:
We use methods from microlocal analysis and quantum scattering to study spectral properties near the threshold zero of the Kramers-Fokker-Planck operator with a decaying potential in $\mathbb R^n$, $n \ge 4$, and deduce the large-time behavior of solutions to the kinetic Kramers-Fokker-Planck equation. For short-range potentials, we establish an optimal time-decay estimate in weighted $L^2$-spaces…
▽ More
We use methods from microlocal analysis and quantum scattering to study spectral properties near the threshold zero of the Kramers-Fokker-Planck operator with a decaying potential in $\mathbb R^n$, $n \ge 4$, and deduce the large-time behavior of solutions to the kinetic Kramers-Fokker-Planck equation. For short-range potentials, we establish an optimal time-decay estimate in weighted $L^2$-spaces when $ n\ge 5$ is odd. For potentials decaying like $O(|x|^{-ρ})$ for some $ρ> n-1$, we obtain, for all dimensions $n \ge 4$, a large-time expansion of the solution with the leading term given by the Maxwell-Boltzmann distribution multiplied by the factor $(4πt)^{-\frac n 2}$ corresponding to the decay for the heat equation. These results complete those obtained in [16, 22] for dimensions $n=1$ and $3$. The same questions for $n=2$ are still open.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.