-
MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
Authors:
Chunjiang Wang,
Kun Zhang,
Yandong Liu,
Zhiyang He,
Xiaodong Tao,
S. Kevin Zhou
Abstract:
The concept bottleneck model (CBM), as a technique improving interpretability via linking predictions to human-understandable concepts, makes high-risk and life-critical medical image classification credible. Typically, existing CBM methods associate the final layer of visual encoders with concepts to explain the model's predictions. However, we empirically discover the phenomenon of concept prefe…
▽ More
The concept bottleneck model (CBM), as a technique improving interpretability via linking predictions to human-understandable concepts, makes high-risk and life-critical medical image classification credible. Typically, existing CBM methods associate the final layer of visual encoders with concepts to explain the model's predictions. However, we empirically discover the phenomenon of concept preference variation, that is, the concepts are preferably associated with the features at different layers than those only at the final layer; yet a blind last-layer-based association neglects such a preference variation and thus weakens the accurate correspondences between features and concepts, impairing model interpretability. To address this issue, we propose a novel Multi-layer Visual Preference-enhanced Concept Bottleneck Model (MVP-CBM), which comprises two key novel modules: (1) intra-layer concept preference modeling, which captures the preferred association of different concepts with features at various visual layers, and (2) multi-layer concept sparse activation fusion, which sparsely aggregates concept activations from multiple layers to enhance performance. Thus, by explicitly modeling concept preferences, MVP-CBM can comprehensively leverage multi-layer visual information to provide a more nuanced and accurate explanation of model decisions. Extensive experiments on several public medical classification benchmarks demonstrate that MVP-CBM achieves state-of-the-art accuracy and interoperability, verifying its superiority. Code is available at https://github.com/wcj6/MVP-CBM.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek
Authors:
Peiran Qiu,
Siyi Zhou,
Emilio Ferrara
Abstract:
This study examines information suppression mechanisms in DeepSeek, an open-source large language model (LLM) developed in China. We propose an auditing framework and use it to analyze the model's responses to 646 politically sensitive prompts by comparing its final output with intermediate chain-of-thought (CoT) reasoning. Our audit unveils evidence of semantic-level information suppression in De…
▽ More
This study examines information suppression mechanisms in DeepSeek, an open-source large language model (LLM) developed in China. We propose an auditing framework and use it to analyze the model's responses to 646 politically sensitive prompts by comparing its final output with intermediate chain-of-thought (CoT) reasoning. Our audit unveils evidence of semantic-level information suppression in DeepSeek: sensitive content often appears within the model's internal reasoning but is omitted or rephrased in the final output. Specifically, DeepSeek suppresses references to transparency, government accountability, and civic mobilization, while occasionally amplifying language aligned with state propaganda. This study underscores the need for systematic auditing of alignment, content moderation, information suppression, and censorship practices implemented into widely-adopted AI models, to ensure transparency, accountability, and equitable access to unbiased information obtained by means of these systems.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Relative Error Fair Clustering in the Weak-Strong Oracle Model
Authors:
Vladimir Braverman,
Prathamesh Dharangutte,
Shaofeng H. -C. Jiang,
Hoai-An Nguyen,
Chen Wang,
Yubo Zhang,
Samson Zhou
Abstract:
We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasing…
▽ More
We study fair clustering problems in a setting where distance information is obtained from two sources: a strong oracle providing exact distances, but at a high cost, and a weak oracle providing potentially inaccurate distance estimates at a low cost. The goal is to produce a near-optimal fair clustering on $n$ input points with a minimum number of strong oracle queries. This models the increasingly common trade-off between accurate but expensive similarity measures (e.g., large-scale embeddings) and cheaper but inaccurate alternatives. The study of fair clustering in the model is motivated by the important quest of achieving fairness with the presence of inaccurate information. We achieve the first $(1+\varepsilon)$-coresets for fair $k$-median clustering using $\text{poly}\left(\frac{k}{\varepsilon}\cdot\log n\right)$ queries to the strong oracle. Furthermore, our results imply coresets for the standard setting (without fairness constraints), and we could in fact obtain $(1+\varepsilon)$-coresets for $(k,z)$-clustering for general $z=O(1)$ with a similar number of strong oracle queries. In contrast, previous results achieved a constant-factor $(>10)$ approximation for the standard $k$-clustering problems, and no previous work considered the fair $k$-median clustering problem.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
Authors:
Houyi Li,
Ka Man Lo,
Ziqi Wang,
Zili Wang,
Wenzhen Zheng,
Shuigeng Zhou,
Xiangyu Zhang,
Daxin Jiang
Abstract:
Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute. However, can MoEs surpass dense architectures under strictly equal resource constraints - that is, when the total parameter count, training compute, and data budget are identical? This question remains under-explored despite its significant practical…
▽ More
Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute. However, can MoEs surpass dense architectures under strictly equal resource constraints - that is, when the total parameter count, training compute, and data budget are identical? This question remains under-explored despite its significant practical value and potential. In this paper, we propose a novel perspective and methodological framework to study this question thoroughly. First, we comprehensively investigate the architecture of MoEs and achieve an optimal model design that maximizes the performance. Based on this, we subsequently find that an MoE model with activation rate in an optimal region is able to outperform its dense counterpart under the same total parameter, training compute and data resource. More importantly, this optimal region remains consistent across different model sizes. Although additional amount of data turns out to be a trade-off for the enhanced performance, we show that this can be resolved via reusing data. We validate our findings through extensive experiments, training nearly 200 language models at 2B scale and over 50 at 7B scale, cumulatively processing 50 trillion tokens. All models will be released publicly.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Authors:
Zihan Zheng,
Zerui Cheng,
Zeyu Shen,
Shang Zhou,
Kaiyuan Liu,
Hansen He,
Dongruixuan Li,
Stanley Wei,
Hangyi Hao,
Jianzhu Yao,
Peiyao Sheng,
Zixuan Wang,
Wenhao Chai,
Aleksandra Korolova,
Peter Henderson,
Sanjeev Arora,
Pramod Viswanath,
Jingbo Shang,
Saining Xie
Abstract:
Recent reports claim that large language models (LLMs) now outperform elite humans in competitive programming. Drawing on knowledge from a group of medalists in international algorithmic contests, we revisit this claim, examining how LLMs differ from human experts and where limitations still remain. We introduce LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI tha…
▽ More
Recent reports claim that large language models (LLMs) now outperform elite humans in competitive programming. Drawing on knowledge from a group of medalists in international algorithmic contests, we revisit this claim, examining how LLMs differ from human experts and where limitations still remain. We introduce LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination. A team of Olympiad medalists annotates every problem for algorithmic categories and conducts a line-by-line analysis of failed model-generated submissions. Using this new data and benchmark, we find that frontier models still have significant limitations: without external tools, the best model achieves only 53% pass@1 on medium-difficulty problems and 0% on hard problems, domains where expert humans still excel. We also find that LLMs succeed at implementation-heavy problems but struggle with nuanced algorithmic reasoning and complex case analysis, often generating confidently incorrect justifications. High performance appears largely driven by implementation precision and tool augmentation, not superior reasoning. LiveCodeBench Pro thus highlights the significant gap to human grandmaster levels, while offering fine-grained diagnostics to steer future improvements in code-centric LLM reasoning.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Farseer: A Refined Scaling Law in Large Language Models
Authors:
Houyi Li,
Wenzhen Zheng,
Qiufeng Wang,
Zhenyu Ding,
Haoying Wang,
Zili Wang,
Shijie Xuyang,
Ning Ding,
Shuigeng Zhou,
Xiangyu Zhang,
Daxin Jiang
Abstract:
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing…
▽ More
Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N,D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all $(N,D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. We are comprehensively open-sourcing all models, data, results, and logs at https://github.com/Farseer-Scaling-Law/Farseer to foster further research.
△ Less
Submitted 14 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
Search for sub-GeV invisible particles in inclusive decays of $J/ψ$ to $φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (704 additional authors not shown)
Abstract:
A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the…
▽ More
A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the rest of the events, and the recoil mass against the $φ$ is obtained precisely from the kinematic constraint in the event. No significant signal is observed in the investigated region and the upper limit on the inclusive branching fraction of $J/ψ\rightarrowφ+ X$ is determined to be $7.5\times10^{-8}$ at 90% confidence level. Upper limits at a 90% confidence level are also given for this branching fraction as a function of the invisible particle mass, varying from $9\times10^{-9}$ to $4\times10^{-8}$ over the investigated mass range. Additionally, a 90% confidence level upper limit on the branching fraction of $η\rightarrow \rm{invisible}$ is determined to $2.6\times10^{-5}$, which improves the previous best results by more than four times. The analysis technique in this work offers a clean window to search for sub-GeV invisible particles, which can be adapted for other $J/ψ$ decays and direct $e^+e^-$ annihilation experiments in future studies, and improve the sensitivity by orders of magnitude.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Correlative angstrom-scale microscopy and spectroscopy of graphite-water interfaces
Authors:
Lalith Krishna Samanth Bonagiri,
Diana M. Arvelo,
Fujia Zhao,
Jaehyeon Kim,
Qian Ai,
Shan Zhou,
Kaustubh S. Panse,
Ricardo Garcia,
Yingjie Zhang
Abstract:
Water at solid surfaces is key for many processes ranging from biological signal transduction to membrane separation and renewable energy conversion. However, under realistic conditions, which often include environmental and surface charge variations, the interfacial water structure remains elusive. Here we overcome this limit by combining three-dimensional atomic force microscopy and interface-se…
▽ More
Water at solid surfaces is key for many processes ranging from biological signal transduction to membrane separation and renewable energy conversion. However, under realistic conditions, which often include environmental and surface charge variations, the interfacial water structure remains elusive. Here we overcome this limit by combining three-dimensional atomic force microscopy and interface-sensitive Raman spectroscopy to characterize the graphite-water interfacial structure in situ. Through correlative analysis of the spatial liquid density maps and vibrational peaks within ~2 nm of the graphite surface, we find the existence of two interfacial configurations at open circuit potential, a transient state where pristine water exhibits strong hydrogen bond (HB) breaking effects, and a steady state with hydrocarbons dominating the interface and weak HB breaking in the surrounding water. At sufficiently negative potentials, both states transition into a stable structure featuring pristine water with a broader distribution of HB configurations. Our three-state model resolves many long-standing controversies on interfacial water structure.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review
Authors:
Wenxin Liu,
Jiakun Fang,
Shichang Cui,
Iskandar Abdullaev,
Suyang Zhou,
Xiaomeng Ai,
Jinyu Wen
Abstract:
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress…
▽ More
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress in optimization and control technologies for hydrogen-blended integrated gas-electricity system. First, key technologies and international demonstration projects are introduced to provide an overview of current developments. Besides, advances in gas-electricity system integration, including modeling, scheduling, planning and market design, are reviewed respectively. Then, the potential for cross-system fault propagation is highlighted, and practical methods for safety analysis and control are proposed. Finally, several possible research directions are introduced, aiming to ensure efficient renewable integration and reliable operation.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
A High-Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Authors:
Yukang Feng,
Jianwen Sun,
Chuanhao Li,
Zizhen Li,
Jiaxin Ai,
Fanrui Zhang,
Yifan Chang,
Sizhuo Zhou,
Shenglin Zhang,
Yu Dai,
Kaipeng Zhang
Abstract:
Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed us…
▽ More
Recent advancements in Large Multimodal Models (LMMs) have significantly improved multimodal understanding and generation. However, these models still struggle to generate tightly interleaved image-text outputs, primarily due to the limited scale, quality and instructional richness of current training datasets. To address this, we introduce InterSyn, a large-scale multimodal dataset constructed using our Self-Evaluation with Iterative Refinement (SEIR) method. InterSyn features multi-turn, instruction-driven dialogues with tightly interleaved imagetext responses, providing rich object diversity and rigorous automated quality refinement, making it well-suited for training next-generation instruction-following LMMs. Furthermore, to address the lack of reliable evaluation tools capable of assessing interleaved multimodal outputs, we introduce SynJudge, an automatic evaluation model designed to quantitatively assess multimodal outputs along four dimensions: text content, image content, image quality, and image-text synergy.
Experimental studies show that the SEIR method leads to substantially higher dataset quality compared to an otherwise identical process without refinement.
Moreover, LMMs trained on InterSyn achieve uniform performance gains across all evaluation metrics, confirming InterSyn's utility for advancing multimodal systems.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation
Authors:
Ye Niu,
Sanping Zhou,
Yizhe Li,
Ye Den,
Le Wang
Abstract:
In many complex scenarios, robotic manipulation relies on generative models to estimate the distribution of multiple successful actions. As the diffusion model has better training robustness than other generative models, it performs well in imitation learning through successful robot demonstrations. However, the diffusion-based policy methods typically require significant time to iteratively denoi…
▽ More
In many complex scenarios, robotic manipulation relies on generative models to estimate the distribution of multiple successful actions. As the diffusion model has better training robustness than other generative models, it performs well in imitation learning through successful robot demonstrations. However, the diffusion-based policy methods typically require significant time to iteratively denoise robot actions, which hinders real-time responses in robotic manipulation. Moreover, existing diffusion policies model a time-varying action denoising process, whose temporal complexity increases the difficulty of model training and leads to suboptimal action accuracy. To generate robot actions efficiently and accurately, we present the Time-Unified Diffusion Policy (TUDP), which utilizes action recognition capabilities to build a time-unified denoising process. On the one hand, we build a time-unified velocity field in action space with additional action discrimination information. By unifying all timesteps of action denoising, our velocity field reduces the difficulty of policy learning and speeds up action generation. On the other hand, we propose an action-wise training method, which introduces an action discrimination branch to supply additional action discrimination information. Through action-wise training, the TUDP implicitly learns the ability to discern successful actions to better denoising accuracy. Our method achieves state-of-the-art performance on RLBench with the highest success rate of 82.6% on a multi-view setup and 83.8% on a single-view setup. In particular, when using fewer denoising iterations, TUDP achieves a more significant improvement in success rate. Additionally, TUDP can produce accurate actions for a wide range of real-world tasks.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Search for the charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (705 additional authors not shown)
Abstract:
Based on $(10087\pm44)\times 10^6$ $J/ψ$ events recorded with the BESIII detector, we search for the rare charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$ No signal is observed, and upper limits on the branching fractions at the $90\%$ confidence level are set as $\mathcal{B}(J/ψ\to D_{s}^{-}ρ^{+}+c.c.)<8.0\times10^{-7}$ and…
▽ More
Based on $(10087\pm44)\times 10^6$ $J/ψ$ events recorded with the BESIII detector, we search for the rare charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$ No signal is observed, and upper limits on the branching fractions at the $90\%$ confidence level are set as $\mathcal{B}(J/ψ\to D_{s}^{-}ρ^{+}+c.c.)<8.0\times10^{-7}$ and $\mathcal{B}(J/ψ\to D_{s}^{-}π^{+}+c.c.)<4.1\times10^{-7}$. Our results provide the most stringent experimental constraints on these decays.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Optimization of target film materials and protective coatings for sealed neutron generator
Authors:
Yingying Cao,
Sijia Zhou,
Pingwei Sun,
Jiayu Li,
Shangrui Jiang,
Shiwei Jing
Abstract:
Magnesium target film has better thermal stability and neutron yield than titanium target, making it a potential neutron generator target film material. The radiation resistance of elemental magnesium targets is relatively weak, and their radiation resistance can be improved by alloying magnesium target films. The irradiation damage of pure magnesium targets and magnesium alloy target films was st…
▽ More
Magnesium target film has better thermal stability and neutron yield than titanium target, making it a potential neutron generator target film material. The radiation resistance of elemental magnesium targets is relatively weak, and their radiation resistance can be improved by alloying magnesium target films. The irradiation damage of pure magnesium targets and magnesium alloy target films was studied using SRIM. The results indicate that the irradiation damage of magnesium alloy target films (magnesium-niobium, magnesium-zirconium alloys) is lower than that of pure magnesium targets. In addition, under the same alloy ratio, the radiation resistance of magnesium-niobium alloy target film is better than that of magnesium-zirconium alloy. In order to further in-vestigate the performance of magnesium alloy target films, the incident ion energy, protective coatings (nickel oxide, aluminum oxide, palladium oxide), magnesium alloy target films, and alloy doping ratios (0.2, 0.4, 0.6, 0.8, 1.0) were changed. After calculating the effects of the above conditions on the neutron generator yield, sputtering yield, and considering irradiation damage, it was determined that a magnesium-zirconium alloy with a doping rate of 0.2 and a nickel oxide protective coating with a thickness of 7.5 nm are potential target film materials for the neutron generator.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Authors:
Ailin Huang,
Bingxin Li,
Bruce Wang,
Boyong Wu,
Chao Yan,
Chengli Feng,
Heng Wang,
Hongyu Zhou,
Hongyuan Wang,
Jingbei Li,
Jianjian Sun,
Joanna Wang,
Mingrui Chen,
Peng Liu,
Ruihang Miao,
Shilei Jiang,
Tian Fei,
Wang You,
Xi Chen,
Xuerui Yang,
Yechang Huang,
Yuxiang Zhang,
Zheng Ge,
Zheng Gong,
Zhewei Huang
, et al. (51 additional authors not shown)
Abstract:
Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du…
▽ More
Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter backbone LLM and a neural vocoder for high-fidelity speech synthesis. Our post-training approach employs interleaved token-output of text and audio to enhance semantic coherence and combines Direct Preference Optimization (DPO) with model merge to improve performance. Evaluations on the StepEval-Audio-360 benchmark demonstrate that Step-Audio-AQAA excels especially in speech control, outperforming the state-of-art LALMs in key areas. This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks.
△ Less
Submitted 13 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Measurement of the $η$ transition form factor through $η' \rightarrow π^+π^-η$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and…
▽ More
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and $Λ^{-2}=1.645\pm0.343_{\rm stat.}\pm0.017_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ for the di-muon channel. The branching fractions for $η\rightarrowγe^+e^-$ and $η\rightarrowγμ^+μ^-$ are measured to be $\mathcal{B}(η\toγe^+e^-)=(6.79\pm0.04_{\rm stat.}\pm0.36_{\rm sys.})\times 10^{-3}$ and $\mathcal{B}(η\toγμ^+μ^-)=(2.97\pm0.11_{\rm stat.}\pm0.07_{\rm sys.})\times 10^{-4}$. By combining with the results based on the $J/ψ\toγη,~η\toγe^+e^-$ events from the previous BESIII measurement, we determine $Λ^{-2}=1.707\pm0.076_{\rm stat.}\pm0.029_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ and $\mathcal{B}(η\toγe^+e^-)=(6.93\pm0.28_{\rm tot.})\times 10^{-3}$. In addition, we search for the dark photon ($A'$) using the combined events. No significant signal is observed, and the upper limits on $\mathcal{B}(η\toγA',~A'\to e^+e^-)$ are set at 90\% confidence level for different $A'$ mass hypotheses.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Dense Matter in Neutron Stars with eXTP
Authors:
Ang Li,
Anna L. Watts,
Guobao Zhang,
Sebastien Guillot,
Yanjun Xu,
Andrea Santangelo,
Silvia Zane,
Hua Feng,
Shuang-Nan Zhang,
Mingyu Ge,
Liqiang Qi,
Tuomo Salmi,
Bas Dorsman,
Zhiqiang Miao,
Zhonghao Tu,
Yuri Cavecchi,
Xia Zhou,
Xiaoping Zheng,
Weihua Wang,
Quan Cheng,
Xuezhi Liu,
Yining Wei,
Wei Wang,
Yujing Xu,
Shanshan Weng
, et al. (58 additional authors not shown)
Abstract:
In this White Paper, we present the potential of the enhanced X-ray Timing and Polarimetry (eXTP) mission to constrain the equation of state of dense matter in neutron stars, exploring regimes not directly accessible to terrestrial experiments. By observing a diverse population of neutron stars - including isolated objects, X-ray bursters, and accreting systems - eXTP's unique combination of timin…
▽ More
In this White Paper, we present the potential of the enhanced X-ray Timing and Polarimetry (eXTP) mission to constrain the equation of state of dense matter in neutron stars, exploring regimes not directly accessible to terrestrial experiments. By observing a diverse population of neutron stars - including isolated objects, X-ray bursters, and accreting systems - eXTP's unique combination of timing, spectroscopy, and polarimetry enables high-precision measurements of compactness, spin, surface temperature, polarimetric signals, and timing irregularity. These multifaceted observations, combined with advances in theoretical modeling, pave the way toward a comprehensive description of the properties and phases of dense matter from the crust to the core of neutron stars. Under development by an international Consortium led by the Institute of High Energy Physics of the Chinese Academy of Sciences, the eXTP mission is planned to be launched in early 2030.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
A novel measurement of the strong-phase difference between $D^0\to K^-π^+$ and $\bar{D}^0\to K^-π^+$ decays using $C$-even and $C$-odd quantum-correlated $D\bar{D}$ pairs
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (707 additional authors not shown)
Abstract:
A novel measurement technique of strong-phase differences between the decay amplitudes of $D^0$ and $\bar{D}^0$ mesons is introduced which exploits quantum-correlated $D\bar{D}$ pairs produced by $e^+e^-$ collisions at energies above the $ψ(3770)$ production threshold, where $D\bar{D}$ pairs are produced in both even and odd eigenstates of the charge-conjugation symmetry. Employing this technique,…
▽ More
A novel measurement technique of strong-phase differences between the decay amplitudes of $D^0$ and $\bar{D}^0$ mesons is introduced which exploits quantum-correlated $D\bar{D}$ pairs produced by $e^+e^-$ collisions at energies above the $ψ(3770)$ production threshold, where $D\bar{D}$ pairs are produced in both even and odd eigenstates of the charge-conjugation symmetry. Employing this technique, the first determination of a $D^0$-$\bar{D^0}$ relative strong phase is reported with such data samples. The strong-phase difference between $D^0\to K^-π^+$ and $\bar{D}^0\to K^-π^+$ decays, $δ^{D}_{Kπ}$, is measured to be $δ^{D}_{Kπ}=\left(192.8^{+11.0 + 1.9}_{-12.4 -2.4}\right)^\circ$, using a dataset corresponding to an integrated luminosity of 7.13 $\text{fb}^{-1}$ collected at center-of-mass energies between $4.13-4.23 \text{ GeV}$ by the BESIII experiment.
△ Less
Submitted 10 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
First observation of quantum correlations in $e^+e^-\to XD\bar{D}$ and $C$-even constrained $D\bar{D}$ pairs
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (707 additional authors not shown)
Abstract:
The study of meson pairs produced with quantum correlations gives direct access to parameters that are challenging to measure in other systems. In this Letter, the existence of quantum correlations due to charge-conjugation symmetry $C$ are demonstrated in $D\bar{D}$ pairs produced through the processes $e^+e^-\to D\bar{D}$, $e^+e^- \to D^{*}\bar{D}$, and $e^+e^- \to D^{*} \bar{D}^*$, where the la…
▽ More
The study of meson pairs produced with quantum correlations gives direct access to parameters that are challenging to measure in other systems. In this Letter, the existence of quantum correlations due to charge-conjugation symmetry $C$ are demonstrated in $D\bar{D}$ pairs produced through the processes $e^+e^-\to D\bar{D}$, $e^+e^- \to D^{*}\bar{D}$, and $e^+e^- \to D^{*} \bar{D}^*$, where the lack of charge superscripts refers to an admixture of neutral-charm-meson particle and antiparticle states, using $7.13 \text{ fb}^{-1}$ of $e^+e^-$ collision data collected by the BESIII experiment between center-of-mass energies of $4.13-4.23 \text{ GeV}$. Processes with either $C$-even or $C$-odd constraints are identified and separated. A procedure is presented that harnesses the entangled production process to enable measurements of $D^0$-meson hadronic parameters. This study provides the first confirmation of quantum correlations in $e^+e^-\to X D\bar{D}$ processes and the first observation of a $C$-even constrained $D\bar{D}$ system. The procedure is applied to measure $δ^{D}_{Kπ}$, the strong phase between the $D^0\to K^-π^+$ and $\bar{D}^0\to K^-π^+$ decay amplitudes, which results in the determination of $δ^{D}_{Kπ}=\left(192.8^{+11.0 + 1.9}_{-12.4 -2.4}\right)^\circ$. The potential for measurements of other hadronic decay parameters and charm mixing with these and future datasets is also discussed.
△ Less
Submitted 10 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Measurement of the CP asymmetry in $D^+ \to π^+ π^0$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
N. K. Baghel,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett
, et al. (380 additional authors not shown)
Abstract:
We measure the CP asymmetry in $D^+ \to π^+ π^0$ decays reconstructed in $e^+ e^-$ collisions at the Belle II experiment using a data set corresponding to an integrated luminosity of 428 fb$^{-1}$. A control sample of $D^+ \to π^+ K_{S}$ decays is used to correct for detection and production asymmetries. The result, $A_{CP}(D^+ \to π^+π^0) =(-1.8 \pm 0.9 \pm 0.1)\%$, where the first uncertainty is…
▽ More
We measure the CP asymmetry in $D^+ \to π^+ π^0$ decays reconstructed in $e^+ e^-$ collisions at the Belle II experiment using a data set corresponding to an integrated luminosity of 428 fb$^{-1}$. A control sample of $D^+ \to π^+ K_{S}$ decays is used to correct for detection and production asymmetries. The result, $A_{CP}(D^+ \to π^+π^0) =(-1.8 \pm 0.9 \pm 0.1)\%$, where the first uncertainty is statistical and the second systematic, is the most precise determination to date. It agrees with the prediction of CP symmetry from the standard model, and with results of previous measurements.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Authors:
Jinxi Li,
Ziyang Song,
Siyuan Zhou,
Bo Yang
Abstract:
In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the p…
▽ More
In this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the physics of complex dynamic 3D scenes without needing any object priors. The key to our approach is to introduce a physics code followed by a carefully designed divergence-free module for estimating a per-Gaussian velocity field, without relying on the inefficient PINN losses. Extensive experiments on three public datasets and a newly collected challenging real-world dataset demonstrate the superior performance of our method for future frame extrapolation and motion segmentation. Most notably, our investigation into the learned physics codes reveals that they truly learn meaningful 3D physical motion patterns in the absence of any human labels in training.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
FedCGD: Collective Gradient Divergence Optimized Scheduling for Wireless Federated Learning
Authors:
Tan Chen,
Jintao Yan,
Yuxuan Sun,
Sheng Zhou,
Zhisheng Niu
Abstract:
Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property…
▽ More
Federated learning (FL) is a promising paradigm for multiple devices to cooperatively train a model. When applied in wireless networks, two issues consistently affect the performance of FL, i.e., data heterogeneity of devices and limited bandwidth. Many papers have investigated device scheduling strategies considering the two issues. However, most of them recognize data heterogeneity as a property of individual devices. In this paper, we prove that the convergence speed of FL is affected by the sum of device-level and sample-level collective gradient divergence (CGD). The device-level CGD refers to the gradient divergence of the scheduled device group, instead of the sum of the individual device divergence. The sample-level CGD is statistically upper bounded by sampling variance, which is inversely proportional to the total number of samples scheduled for local update. To derive a tractable form of the device-level CGD, we further consider a classification problem and transform it into the weighted earth moving distance (WEMD) between the group distribution and the global distribution. Then we propose FedCGD algorithm to minimize the sum of multi-level CGDs by balancing WEMD and sampling variance, within polynomial time. Simulation shows that the proposed strategy increases classification accuracy on the CIFAR-10 dataset by up to 4.2\% while scheduling 41.8\% fewer devices, and flexibly switches between reducing WEMD and reducing sampling variance.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Mobility-Aware Asynchronous Federated Learning with Dynamic Sparsification
Authors:
Jintao Yan,
Tan Chen,
Yuxuan Sun,
Zhaojun Nan,
Sheng Zhou,
Zhisheng Niu
Abstract:
Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical m…
▽ More
Asynchronous Federated Learning (AFL) enables distributed model training across multiple mobile devices, allowing each device to independently update its local model without waiting for others. However, device mobility introduces intermittent connectivity, which necessitates gradient sparsification and leads to model staleness, jointly affecting AFL convergence. This paper develops a theoretical model to characterize the interplay among sparsification, model staleness and mobility-induced contact patterns, and their joint impact on AFL convergence. Based on the analysis, we propose a mobility-aware dynamic sparsification (MADS) algorithm that optimizes the sparsification degree based on contact time and model staleness. Closed-form solutions are derived, showing that under low-speed conditions, MADS increases the sparsification degree to enhance convergence, while under high-speed conditions, it reduces the sparsification degree to guarantee reliable uploads within limited contact time. Experimental results validate the theoretical findings. Compared with the state-of-the-art benchmarks, the MADS algorithm increases the image classification accuracy on the CIFAR-10 dataset by 8.76% and reduces the average displacement error in the Argoverse trajectory prediction dataset by 9.46%.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems
Authors:
Yuhan Cao,
Zian Chen,
Kun Quan,
Ziliang Zhang,
Yu Wang,
Xiaoning Dong,
Yeqi Feng,
Guanzhong He,
Jingcheng Huang,
Jianhao Li,
Yixuan Tan,
Jiafu Tang,
Yilin Tang,
Junlei Wu,
Qianyu Xiao,
Can Zheng,
Shouchen Zhou,
Yuxiang Zhu,
Yiming Huang,
Tian Xie,
Tianxing He
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test case generation remains largely unexplored. We investigate this problem from the perspective of competition-level programming (CP) programs and propose TCGBench, a…
▽ More
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test case generation remains largely unexplored. We investigate this problem from the perspective of competition-level programming (CP) programs and propose TCGBench, a Benchmark for (LLM generation of) Test Case Generators. This benchmark comprises two tasks, aimed at studying the capabilities of LLMs in (1) generating valid test case generators for a given CP problem, and further (2) generating targeted test case generators that expose bugs in human-written code. Experimental results indicate that while state-of-the-art LLMs can generate valid test case generators in most cases, most LLMs struggle to generate targeted test cases that reveal flaws in human code effectively. Especially, even advanced reasoning models (e.g., o3-mini) fall significantly short of human performance in the task of generating targeted generators. Furthermore, we construct a high-quality, manually curated dataset of instructions for generating targeted generators. Analysis demonstrates that the performance of LLMs can be enhanced with the aid of this dataset, by both prompting and fine-tuning.
△ Less
Submitted 10 June, 2025; v1 submitted 7 June, 2025;
originally announced June 2025.
-
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
Authors:
Hongyan Zhi,
Peihao Chen,
Siyuan Zhou,
Yubo Dong,
Quanxi Wu,
Lei Han,
Mingkui Tan
Abstract:
Manipulation has long been a challenging task for robots, while humans can effortlessly perform complex interactions with objects, such as hanging a cup on the mug rack. A key reason is the lack of a large and uniform dataset for teaching robots manipulation skills. Current robot datasets often record robot action in different action spaces within a simple scene. This hinders the robot to learn a…
▽ More
Manipulation has long been a challenging task for robots, while humans can effortlessly perform complex interactions with objects, such as hanging a cup on the mug rack. A key reason is the lack of a large and uniform dataset for teaching robots manipulation skills. Current robot datasets often record robot action in different action spaces within a simple scene. This hinders the robot to learn a unified and robust action representation for different robots within diverse scenes. Observing how humans understand a manipulation task, we find that understanding how the objects should move in the 3D space is a critical clue for guiding actions. This clue is embodiment-agnostic and suitable for both humans and different robots. Motivated by this, we aim to learn a 3D flow world model from both human and robot manipulation data. This model predicts the future movement of the interacting objects in 3D space, guiding action planning for manipulation. Specifically, we synthesize a large-scale 3D optical flow dataset, named ManiFlow-110k, through a moving object auto-detect pipeline. A video diffusion-based world model then learns manipulation physics from these data, generating 3D optical flow trajectories conditioned on language instructions. With the generated 3D object optical flow, we propose a flow-guided rendering mechanism, which renders the predicted final state and leverages GPT-4o to assess whether the predicted flow aligns with the task description. This equips the robot with a closed-loop planning ability. Finally, we consider the predicted 3D optical flow as constraints for an optimization policy to determine a chunk of robot actions for manipulation. Extensive experiments demonstrate strong generalization across diverse robotic manipulation tasks and reliable cross-embodiment adaptation without hardware-specific training.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning
Authors:
Yuheng Lei,
Sitong Mao,
Shunbo Zhou,
Hongyuan Zhang,
Xuelong Li,
Ping Luo
Abstract:
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the contex…
▽ More
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting. Previous work within the dominant pretrain-then-finetune paradigm has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters. However, in the context of lifelong learning, these methods rely on the impractical assumption of a test-time task identifier and restrict knowledge sharing among isolated adapters. To address these limitations, we propose Dynamic Mixture of Progressive Parameter-Efficient Expert Library (DMPEL) for lifelong robot learning. DMPEL progressively learn a low-rank expert library and employs a lightweight router to dynamically combine experts into an end-to-end policy, facilitating flexible behavior during lifelong adaptation. Moreover, by leveraging the modular structure of the fine-tuned parameters, we introduce coefficient replay to guide the router in accurately retrieving frozen experts for previously encountered tasks, thereby mitigating catastrophic forgetting. This method is significantly more storage- and computationally-efficient than applying demonstration replay to the entire policy. Extensive experiments on the lifelong manipulation benchmark LIBERO demonstrate that our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Twenty-Five Years of the Intelligent Driver Model: Foundations, Extensions, Applications, and Future Directions
Authors:
Shirui Zhou,
Shiteng Zheng,
Junfang Tian,
Rui Jiang,
and H. M. Zhang
Abstract:
The Intelligent Driver Model (IDM), proposed in 2000, has become a foundational tool in traffic flow modeling, renowned for its simplicity, computational efficiency, and ability to capture diverse traffic dynamics. Over the past 25 years, IDM has significantly advanced car-following theory and found extensive application in intelligent transportation systems, including driver assistance systems an…
▽ More
The Intelligent Driver Model (IDM), proposed in 2000, has become a foundational tool in traffic flow modeling, renowned for its simplicity, computational efficiency, and ability to capture diverse traffic dynamics. Over the past 25 years, IDM has significantly advanced car-following theory and found extensive application in intelligent transportation systems, including driver assistance systems and autonomous vehicle control. However, IDM's deterministic framework and simplified assumptions face limitations in addressing real-world complexities such as stochastic variability, driver heterogeneity, and mixed traffic conditions. This paper provides a systematic review and critical reflection on IDM's theoretical foundations, academic influence, practical applications, and model extensions. While highlighting IDM's contributions, we emphasize the need to extend the model into a modular and extensible framework. Future directions include integrating stochastic elements, human behavioral insights, and hybrid modeling approaches that combine physics-based structures with data-driven methodologies. By reimagining IDM as a flexible modeling basis, this paper aims to inspire its continued development to meet the demands of intelligent, connected, and increasingly complex traffic systems.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
A Driving Regime-Embedded Deep Learning Framework for Modeling Intra-Driver Heterogeneity in Multi-Scale Car-Following Dynamics
Authors:
Shirui Zhou,
Jiying Yan,
Junfang Tian,
Tao Wang,
Yongfu Li,
Shiquan Zhong
Abstract:
A fundamental challenge in car-following modeling lies in accurately representing the multi-scale complexity of driving behaviors, particularly the intra-driver heterogeneity where a single driver's actions fluctuate dynamically under varying conditions. While existing models, both conventional and data-driven, address behavioral heterogeneity to some extent, they often emphasize inter-driver hete…
▽ More
A fundamental challenge in car-following modeling lies in accurately representing the multi-scale complexity of driving behaviors, particularly the intra-driver heterogeneity where a single driver's actions fluctuate dynamically under varying conditions. While existing models, both conventional and data-driven, address behavioral heterogeneity to some extent, they often emphasize inter-driver heterogeneity or rely on simplified assumptions, limiting their ability to capture the dynamic heterogeneity of a single driver under different driving conditions. To address this gap, we propose a novel data-driven car-following framework that systematically embeds discrete driving regimes (e.g., steady-state following, acceleration, cruising) into vehicular motion predictions. Leveraging high-resolution traffic trajectory datasets, the proposed hybrid deep learning architecture combines Gated Recurrent Units for discrete driving regime classification with Long Short-Term Memory networks for continuous kinematic prediction, unifying discrete decision-making processes and continuous vehicular dynamics to comprehensively represent inter- and intra-driver heterogeneity. Driving regimes are identified using a bottom-up segmentation algorithm and Dynamic Time Warping, ensuring robust characterization of behavioral states across diverse traffic scenarios. Comparative analyses demonstrate that the framework significantly reduces prediction errors for acceleration (maximum MSE improvement reached 58.47\%), speed, and spacing metrics while reproducing critical traffic phenomena, such as stop-and-go wave propagation and oscillatory dynamics.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy
Authors:
Jiakai Zhang,
Shouchen Zhou,
Haizhao Dai,
Xinhang Liu,
Peihao Wang,
Zhiwen Fan,
Yuan Pei,
Jingyi Yu
Abstract:
Pose estimation from unordered images is fundamental for 3D reconstruction, robotics, and scientific imaging. Recent geometric foundation models, such as DUSt3R, enable end-to-end dense 3D reconstruction but remain underexplored in scientific imaging fields like cryo-electron microscopy (cryo-EM) for near-atomic protein reconstruction. In cryo-EM, pose estimation and 3D reconstruction from unorder…
▽ More
Pose estimation from unordered images is fundamental for 3D reconstruction, robotics, and scientific imaging. Recent geometric foundation models, such as DUSt3R, enable end-to-end dense 3D reconstruction but remain underexplored in scientific imaging fields like cryo-electron microscopy (cryo-EM) for near-atomic protein reconstruction. In cryo-EM, pose estimation and 3D reconstruction from unordered particle images still depend on time-consuming iterative optimization, primarily due to challenges such as low signal-to-noise ratios (SNR) and distortions from the contrast transfer function (CTF). We introduce CryoFastAR, the first geometric foundation model that can directly predict poses from Cryo-EM noisy images for Fast ab initio Reconstruction. By integrating multi-view features and training on large-scale simulated cryo-EM data with realistic noise and CTF modulations, CryoFastAR enhances pose estimation accuracy and generalization. To enhance training stability, we propose a progressive training strategy that first allows the model to extract essential features under simpler conditions before gradually increasing difficulty to improve robustness. Experiments show that CryoFastAR achieves comparable quality while significantly accelerating inference over traditional iterative approaches on both synthetic and real datasets.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Observation of $D^+\to K^0_Sπ^0μ^+ν_μ$, Test of Lepton Flavor Universality and First Angular Analysis of $D^+\to \bar{K}^\ast(892)^0\ell^+ν_\ell$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
We report a study of the semileptonic decays $D^+\to K_S^0π^0\ell^+ν_\ell$ ($\ell = e, μ$) based on $20.3\,\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector.
The $D^+\to K_S^0π^0μ^+ν_μ$ decay is observed for the first time, with a branching fraction of $(0.896\pm0.017_{\rm stat}\pm0.008_{\rm syst})\%$, and the branching frac…
▽ More
We report a study of the semileptonic decays $D^+\to K_S^0π^0\ell^+ν_\ell$ ($\ell = e, μ$) based on $20.3\,\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector.
The $D^+\to K_S^0π^0μ^+ν_μ$ decay is observed for the first time, with a branching fraction of $(0.896\pm0.017_{\rm stat}\pm0.008_{\rm syst})\%$, and the branching fraction of $D^+\to K_S^0π^0e^+ν_e$ is determined with the improved precision as $(0.943\pm0.012_{\rm stat}\pm0.010_{\rm syst})\%$.
From the analysis of the dynamics, we observe that the dominant $\bar{K}^\ast(892)^0$ component is accompanied by an $S$-wave contribution, which accounts for $(7.10 \pm 0.68_{\rm stat} \pm 0.41_{\rm syst})\%$ of the total decay rate of the $μ^+$ channel and $(6.39 \pm 0.17_{\rm stat} \pm 0.14_{\rm syst})\%$ of the $e^+$ channel. Assuming a single-pole dominance parameterization, the hadronic form factor ratios are extracted to be $r_V=V(0)/A_1(0)=1.42 \pm\, 0.03_{\rm stat} \pm\, 0.02_{\rm syst}$ and $r_2=A_2(0)/A_1(0)=0.75 \pm\, 0.03_{\rm stat} \pm\, 0.01_{\rm syst}$.
Based on the first comprehensive angular and the decay-rate $CP$ asymmetry analysis, the full set of averaged angular and $CP$ asymmetry observables are measured as a function of the momentum-transfer squared; they are consistent with expectations from the Standard Model. No evidence for violation of $μ-e$ lepton-flavor universality is observed in either the full range or the five chosen bins of momentum-transfer squared.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Learning-Augmented Hierarchical Clustering
Authors:
Vladimir Braverman,
Jon C. Ergun,
Chen Wang,
Samson Zhou
Abstract:
Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the probl…
▽ More
Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the problem of hierarchical clustering given auxiliary information from natural oracles. Specifically, we focus on a *splitting oracle* which, when provided with a triplet of vertices $(u,v,w)$, answers (possibly erroneously) the pairs of vertices whose lowest common ancestor includes all three vertices in an optimal tree, i.e., identifying which vertex ``splits away'' from the others. Using such an oracle, we obtain the following results:
- A polynomial-time algorithm that outputs a hierarchical clustering tree with $O(1)$-approximation to the Dasgupta objective (Dasgupta [STOC'16]).
- A near-linear time algorithm that outputs a hierarchical clustering tree with $(1-o(1))$-approximation to the Moseley-Wang objective (Moseley and Wang [NeurIPS'17]).
Under the plausible Small Set Expansion Hypothesis, no polynomial-time algorithm can achieve any constant approximation for Dasgupta's objective or $(1-C)$-approximation for the Moseley-Wang objective for some constant $C>0$. As such, our results demonstrate that the splitting oracle enables algorithms to outperform standard HC approaches and overcome hardness constraints. Furthermore, our approaches extend to sublinear settings, in which we show new streaming and PRAM algorithms for HC with improved guarantees.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Authors:
Jianyi Wang,
Shanchuan Lin,
Zhijie Lin,
Yuxi Ren,
Meng Wei,
Zongsheng Yue,
Shangchen Zhou,
Hao Chen,
Yang Zhao,
Ceyuan Yang,
Xuefeng Xiao,
Chen Change Loy,
Lu Jiang
Abstract:
Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution…
▽ More
Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency observed under high-resolution VR using window attention with a predefined window size. To stabilize and improve the adversarial post-training towards VR, we further verify the effectiveness of a series of losses, including a proposed feature matching loss without significantly sacrificing training efficiency. Extensive experiments show that SeedVR2 can achieve comparable or even better performance compared with existing VR approaches in a single step.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Study of $f_1(1420)$ and $η(1405)$ in the decay $J/ψ\to γπ^{0}π^{0}π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (650 additional authors not shown)
Abstract:
A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$,…
▽ More
A partial-wave analysis is performed on the decay $J/ψ\toγπ^{0}π^{0}π^{0}$ within the $π^{0}π^{0}π^{0}$ invariant-mass region below 1.6 GeV$/c^{2}$, using $(10.09~\pm~0.04)\times10^{9} ~J/ψ$ events collected with the BESIII detector. Significant isospin-violating decays of $η(1405)$ and $f_1(1420)$ into $f_0(980)π^{0}$ are observed. For the first time, three axial-vectors, $f_1(1285)$, $f_1(1420)$ and $f_1(1510)$, are observed to decay into $π^{0}π^{0}π^{0}$. The product branching fractions of these resonances are reported.
△ Less
Submitted 7 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
The Hippocampal Place Field Gradient: An Eigenmode Theory Linking Grid Cell Projections to Multiscale Learning
Authors:
Shujun Zhou,
Guozhang Chen
Abstract:
The hippocampus encodes space through a striking gradient of place field sizes along its dorsal-ventral axis, yet the principles generating this continuous gradient from discrete grid cell inputs remain debated. We propose a unified theoretical framework establishing that hippocampal place fields arise naturally as linear projections of grid cell population activity, interpretable as eigenmodes. C…
▽ More
The hippocampus encodes space through a striking gradient of place field sizes along its dorsal-ventral axis, yet the principles generating this continuous gradient from discrete grid cell inputs remain debated. We propose a unified theoretical framework establishing that hippocampal place fields arise naturally as linear projections of grid cell population activity, interpretable as eigenmodes. Critically, we demonstrate that a frequency-dependent decay of these grid-to-place connection weights naturally transforms inputs from discrete grid modules into a continuous spectrum of place field sizes. This multiscale organization is functionally significant: we reveal it shapes the inductive bias of the population code, balancing a fundamental trade-off between precision and generalization. Mathematical analysis and simulations demonstrate an optimal place field size for few-shot learning, which scales with environment structure. Our results offer a principled explanation for the place field gradient and generate testable predictions, bridging anatomical connectivity with adaptive learning in both biological and artificial intelligence.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
OpenGT: A Comprehensive Benchmark For Graph Transformers
Authors:
Jiachen Tang,
Zhonghao Wang,
Sirui Chen,
Sheng Zhou,
Jiawei Chen,
Jiajun Bu
Abstract:
Graph Transformers (GTs) have recently demonstrated remarkable performance across diverse domains. By leveraging attention mechanisms, GTs are capable of modeling long-range dependencies and complex structural relationships beyond local neighborhoods. However, their applicable scenarios are still underexplored, this highlights the need to identify when and why they excel. Furthermore, unlike GNNs,…
▽ More
Graph Transformers (GTs) have recently demonstrated remarkable performance across diverse domains. By leveraging attention mechanisms, GTs are capable of modeling long-range dependencies and complex structural relationships beyond local neighborhoods. However, their applicable scenarios are still underexplored, this highlights the need to identify when and why they excel. Furthermore, unlike GNNs, which predominantly rely on message-passing mechanisms, GTs exhibit a diverse design space in areas such as positional encoding, attention mechanisms, and graph-specific adaptations. Yet, it remains unclear which of these design choices are truly effective and under what conditions. As a result, the community currently lacks a comprehensive benchmark and library to promote a deeper understanding and further development of GTs. To address this gap, this paper introduces OpenGT, a comprehensive benchmark for Graph Transformers. OpenGT enables fair comparisons and multidimensional analysis by establishing standardized experimental settings and incorporating a broad selection of state-of-the-art GNNs and GTs. Our benchmark evaluates GTs from multiple perspectives, encompassing diverse tasks and datasets with varying properties. Through extensive experiments, our benchmark has uncovered several critical insights, including the difficulty of transferring models across task levels, the limitations of local attention, the efficiency trade-offs in several models, the application scenarios of specific positional encodings, and the preprocessing overhead of some positional encodings. We aspire for this work to establish a foundation for future graph transformer research emphasizing fairness, reproducibility, and generalizability. We have developed an easy-to-use library OpenGT for training and evaluating existing GTs. The benchmark code is available at https://github.com/eaglelab-zju/OpenGT.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
CSST Cosmological Emulator III: Hybrid Lagrangian Bias Expansion Emulation of Galaxy Clustering
Authors:
Shuren Zhou,
Zhao Chen,
Yu Yu
Abstract:
Galaxy clustering is an important probe in the upcoming China Space Station Telescope (CSST) survey to understand the structure growth and reveal the nature of the dark sector. However, it is a long-term challenge to model this biased tracer and connect the observable to the underlying physics. In this work, we present a hybrid Lagrangian bias expansion emulator, combining the Lagrangian bias expa…
▽ More
Galaxy clustering is an important probe in the upcoming China Space Station Telescope (CSST) survey to understand the structure growth and reveal the nature of the dark sector. However, it is a long-term challenge to model this biased tracer and connect the observable to the underlying physics. In this work, we present a hybrid Lagrangian bias expansion emulator, combining the Lagrangian bias expansion and the accurate dynamical evolution from $N$-body simulation, to predict the power spectrum of the biased tracer in real space. We employ the Kun simulation suite to construct the emulator, emulating across the space of 8 cosmological parameters including dynamic dark energy $w_0$, $w_a$, and total neutrino mass $\sum m_ν$. The sample variance due to the finite simulation box is further reduced using the Zel'dovich variance control, and it enables the precise measurement of the Lagrangian basis spectra up to quadratic order. The emulation of basis spectra realizes 1% level accuracy, covering wavelength $ k \leq 1 \,{\rm Mpc}^{-1}h$ and redshift $0\leq z\leq 3$ up to quadratic order field. To validate the emulator, we perform the joint fitting of the halo auto power spectrum and the halo-matter cross power spectrum from 46 independent simulations. Depending on the choice of counterpart, the joint fitting is unbiased up to $k_{\rm max}\simeq 0.7\,{\rm Mpc}^{-1}h$ with $1\sim 2$ percent accuracy, for all the redshift and halo mass samples. As one of the CSST cosmological emulator series, our emulator is expected to provide accurate theoretical predictions of the galaxy power spectrum for the upcoming CSST survey.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Charged-hadron identification at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
H. Ahmed,
Y. Ahn,
H. Aihara,
N. Akopov,
A. Albert,
S. Alghamdi,
M. Alhakami,
A. Aloisio,
N. Althubiti,
K. Amos,
M. Angelsmark,
N. Anh Ky,
C. Antonioli,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
N. K. Baghel,
S. Bahinipati
, et al. (386 additional authors not shown)
Abstract:
The Belle II experiment's ability to identify particles critically affects the sensitivity of its measurements. We describe Belle II's algorithms for identifying charged particles and evaluate their performance in separating pions, kaons, and protons using 426 fb$^{-1}$ of data collected at the energy-asymmetric $e^+e^-$ collider SuperKEKB in 2019--2022 at center-of-mass energies at and near the m…
▽ More
The Belle II experiment's ability to identify particles critically affects the sensitivity of its measurements. We describe Belle II's algorithms for identifying charged particles and evaluate their performance in separating pions, kaons, and protons using 426 fb$^{-1}$ of data collected at the energy-asymmetric $e^+e^-$ collider SuperKEKB in 2019--2022 at center-of-mass energies at and near the mass of the $Υ(4S)$.
△ Less
Submitted 10 June, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
Authors:
Di Chang,
Mingdeng Cao,
Yichun Shi,
Bo Liu,
Shengqu Cai,
Shijie Zhou,
Weilin Huang,
Gordon Wetzstein,
Mohammad Soleymani,
Peng Wang
Abstract:
Editing images with instructions to reflect non-rigid motions, camera viewpoint shifts, object deformations, human articulations, and complex interactions, poses a challenging yet underexplored problem in computer vision. Existing approaches and datasets predominantly focus on static scenes or rigid transformations, limiting their capacity to handle expressive edits involving dynamic motion. To ad…
▽ More
Editing images with instructions to reflect non-rigid motions, camera viewpoint shifts, object deformations, human articulations, and complex interactions, poses a challenging yet underexplored problem in computer vision. Existing approaches and datasets predominantly focus on static scenes or rigid transformations, limiting their capacity to handle expressive edits involving dynamic motion. To address this gap, we introduce ByteMorph, a comprehensive framework for instruction-based image editing with an emphasis on non-rigid motions. ByteMorph comprises a large-scale dataset, ByteMorph-6M, and a strong baseline model built upon the Diffusion Transformer (DiT), named ByteMorpher. ByteMorph-6M includes over 6 million high-resolution image editing pairs for training, along with a carefully curated evaluation benchmark ByteMorph-Bench. Both capture a wide variety of non-rigid motion types across diverse environments, human figures, and object categories. The dataset is constructed using motion-guided data generation, layered compositing techniques, and automated captioning to ensure diversity, realism, and semantic coherence. We further conduct a comprehensive evaluation of recent instruction-based image editing methods from both academic and commercial domains.
△ Less
Submitted 11 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Measurement of the branching fractions of the Cabibbo-favored decays $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ and $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ and search for $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (660 additional authors not shown)
Abstract:
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII…
▽ More
Based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of about 4.5 fb$^{-1}$ collected at center-of-mass energies between 4599.53 MeV and 4698.82 MeV with the BESIII detector, the absolute branching fraction of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is measured to be $(3.12\pm0.46\pm0.15)\times10^{-3}$. Combined with a previous measurement from the BESIII Collaboration, the branching fraction of the decay $Λ_{c}^{+}\toΛK_{S}^{0}K^{+}$ is calculated to be $(3.07\pm0.26\pm0.13)\times10^{-3}$. The decay $Λ_{c}^{+}\toΞ^{0}K_{S}^{0}π^{+}$ is observed for the first time with a statistical significance of $6.6σ$, and its branching fraction is determined to be $(3.70\pm0.60\pm0.21)\times10^{-3}$. In addition, a search for the decay $Λ_{c}^{+}\toΣ^{0} K_{S}^{0}K^{+}$ is performed and its branching fraction is determined to be $(0.80^{+0.28}_{-0.24}\pm0.16)\times10^{-3}$, corresponding to an upper limit of $1.28\times10^{-3}$ at $90\%$ confidence level. These measurements provide new information that can be used to distinguish between theoretical models.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
LayoutRAG: Retrieval-Augmented Model for Content-agnostic Conditional Layout Generation
Authors:
Yuxuan Wu,
Le Wang,
Sanping Zhou,
Mengnan Liu,
Gang Hua,
Haoxiang Li
Abstract:
Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent diffusion or flow-matching models have achieved considerable advances in multifarious conditional generation tasks, there remains considerable room for generating o…
▽ More
Controllable layout generation aims to create plausible visual arrangements of element bounding boxes within a graphic design according to certain optional constraints, such as the type or position of a specific component. While recent diffusion or flow-matching models have achieved considerable advances in multifarious conditional generation tasks, there remains considerable room for generating optimal arrangements under given conditions. In this work, we propose to carry out layout generation through retrieving by conditions and reference-guided generation. Specifically, we retrieve appropriate layout templates according to given conditions as references. The references are then utilized to guide the denoising or flow-based transport process. By retrieving layouts compatible with the given conditions, we can uncover the potential information not explicitly provided in the given condition. Such an approach offers more effective guidance to the model during the generation process, in contrast to previous models that feed the condition to the model and let the model infer the unprovided layout attributes directly. Meanwhile, we design a condition-modulated attention that selectively absorbs retrieval knowledge, adapting to the difference between retrieved templates and given conditions. Extensive experiment results show that our method successfully produces high-quality layouts that meet the given conditions and outperforms existing state-of-the-art models. Code will be released upon acceptance.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Improved Measurements of $D^+ \to ηe^+ν_e$ and $D^+ \to ημ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (682 additional authors not shown)
Abstract:
Using 20.3 fb$^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we measure the branching fractions of $D^+\to ηe^+ν_e$ and $D^+\to ημ^+ν_μ$ to be $(9.75\pm0.29\pm0.28)\times10^{-4}$ and $(9.08\pm0.35\pm0.23)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. From a simultaneous fit to t…
▽ More
Using 20.3 fb$^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we measure the branching fractions of $D^+\to ηe^+ν_e$ and $D^+\to ημ^+ν_μ$ to be $(9.75\pm0.29\pm0.28)\times10^{-4}$ and $(9.08\pm0.35\pm0.23)\times10^{-4}$, where the first and second uncertainties are statistical and systematic, respectively. From a simultaneous fit to their partial decay rates, we determine the product of the hadronic form factor $f^η_+(0)$ and the modulus of the $c\to d$ Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ to be $f^η_+(0)|V_{cd}|=0.078\pm0.002\pm0.001$. Taking the $|V_{cd}|$ value from the Standard Model global fit as input, we obtain $f^η_+(0)=0.345\pm0.008\pm0.003$. The ratio between the measured branching fractions of $D^+\toη^+μ^+ν_μ$ and $D^+\toηe^+ν_e$, is determined to be $0.93\pm0.05_{\rm stat.}\pm0.02_{\rm syst.}$, indicating no violation of lepton flavor universality.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Intrinsic local Gauss's law preserving PIC method: A self-consistent field-particle update scheme for plasma simulations
Authors:
Zhonghua Qiao,
Zhenli Xu,
Qian Yin,
Shenggao Zhou
Abstract:
In order to perform physically faithful particle-in-cell (PIC) simulations, the Gauss's law stands as a critical requirement, since its violation often leads to catastrophic errors in long-term plasma simulations. This work proposes a novel method that intrinsically enforces the Gauss's law for the Vlasov-Ampère/Vlasov-Poisson system without requiring auxiliary field corrections or specialized cur…
▽ More
In order to perform physically faithful particle-in-cell (PIC) simulations, the Gauss's law stands as a critical requirement, since its violation often leads to catastrophic errors in long-term plasma simulations. This work proposes a novel method that intrinsically enforces the Gauss's law for the Vlasov-Ampère/Vlasov-Poisson system without requiring auxiliary field corrections or specialized current deposition techniques. The electric field is managed to get updated locally and consistently with the motion of particles via splitting the motion into sub-steps along each dimension of the computational mesh. To further obtain a curl-free electric field, a local update scheme is developed to relax the electric-field free energy subject to the Gauss's law. The proposed method avoids solving the Poisson's or Ampère's equation, resulting in a local algorithm of linear complexity for each time step which can be flexibly combined with various temporal discretization for particle motion in PIC simulations. Theoretical analysis verifies that the proposed method indeed maintains the discrete Gauss's law exactly. Numerical tests on classical benchmarks, including the Landau damping, two-stream instability and Diocotron instability, demonstrate the key advantages of the proposed method. It is expected that the local nature of the proposed method makes it a promising tool in parallel simulations of large-scale plasmas.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Authors:
Shengjia Zhang,
Junjie Wu,
Jiawei Chen,
Changwang Zhang,
Xingyu Lou,
Wangchunshu Zhou,
Sheng Zhou,
Can Wang,
Jun Wang
Abstract:
Recent advanced large reasoning models (LRMs) leverage extended chain-of-thought (CoT) reasoning to solve complex tasks, achieving state-of-the-art performance. Despite their success, we identify a critical issue: a substantial portion of simple tasks solved by LRMs can also be addressed by non-reasoning LLMs using significantly fewer tokens, indicating the complex reasoning may not always be nece…
▽ More
Recent advanced large reasoning models (LRMs) leverage extended chain-of-thought (CoT) reasoning to solve complex tasks, achieving state-of-the-art performance. Despite their success, we identify a critical issue: a substantial portion of simple tasks solved by LRMs can also be addressed by non-reasoning LLMs using significantly fewer tokens, indicating the complex reasoning may not always be necessary. To address this, we systematically analyze the reasoning trajectories of LRMs and present a method utilizing identified paradigms and LLM-Judge to classify these trajectories as either Redundant Reasoning or Essential Reasoning. And we introduce OThink-R1, a method that prunes redundant reasoning steps while preserving logical validity. OThink-R1 dynamically employs the non-thinking mode (fast-thinking) for straightforward problems while engaging in deliberate thinking (slow-thinking) for complex problems. Experiments across mathematical and question-answering tasks demonstrate that OThink-R1 reduces reasoning redundancy by almost 23\% on average without compromising accuracy, offering practical guidelines for efficient reasoning models. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage
Authors:
Maxime Gonthier,
Dante D. Sanchez-Gallegos,
Haochen Pan,
Bogdan Nicolae,
Sicheng Zhou,
Hai Duc Nguyen,
Valerie Hayot-Sasson,
J. Gregory Pauloski,
Jesus Carretero,
Kyle Chard,
Ian Foster
Abstract:
The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and…
▽ More
The exponential growth of data necessitates distributed storage models, such as peer-to-peer systems and data federations. While distributed storage can reduce costs and increase reliability, the heterogeneity in storage capacity, I/O performance, and failure rates of storage resources makes their efficient use a challenge. Further, node failures are common and can lead to data unavailability and even data loss. Erasure coding is a common resiliency strategy implemented in storage systems to mitigate failures by striping data across storage locations. However, erasure coding is computationally expensive and existing systems do not consider the heterogeneous resources and their varied capacity and performance when placing data chunks. We tackle the challenges of using erasure coding with distributed and heterogeneous nodes, aiming to store as much data as possible, minimize encoding and decoding time, and meeting user-defined reliability requirements for each data item. We propose two new dynamic scheduling algorithms, D-Rex LB and D-Rex SC, that adaptively choose erasure coding parameters and map chunks to heterogeneous nodes. D-Rex SC achieves robust performance for both storage utilization and throughput, at a higher computational cost, while D-Rex LB is faster but with slightly less competitive performance. In addition, we propose two greedy algorithms, GreedyMinStorage and GreedyLeastUsed, that optimize for storage utilization and load balancing, respectively. Our experimental evaluation shows that our dynamic schedulers store, on average, 45% more data items without significantly degrading I/O throughput compared to state-of-the-art algorithms, while GreedyLeastUsed is able to store 21% more data items while also increasing throughput.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
Loop current order on the kagome lattice
Authors:
Jun Zhan,
Hendrik Hohmann,
Matteo Dürrnagel,
Ruiqing Fu,
Sen Zhou,
Ziqiang Wang,
Ronny Thomale,
Xianxin Wu,
Jiangping Hu
Abstract:
Recent discoveries in kagome materials have unveiled their capacity to harbor exotic quantum states, including intriguing charge density wave (CDW) and superconductivity. Notably, accumulating experimental evidence suggests time-reversal symmetry (TRS) breaking within the CDW, hinting at the long-pursued loop current order (LCO). Despite extensive research efforts, achieving its model realization…
▽ More
Recent discoveries in kagome materials have unveiled their capacity to harbor exotic quantum states, including intriguing charge density wave (CDW) and superconductivity. Notably, accumulating experimental evidence suggests time-reversal symmetry (TRS) breaking within the CDW, hinting at the long-pursued loop current order (LCO). Despite extensive research efforts, achieving its model realization and understanding the mechanism through unbiased many-body simulations have remained both elusive and challenging.In this work, we develop a microscopic model for LCO on the spinless kagome lattice with non-local interactions, utilizing unbiased functional renormalization group calculations to explore ordering tendencies across all two-particle scattering channels. At the van Hove filling, we identify sublattice interference to suppress onsite CDW order, leaving LCO, charge bond and nematic CDW state as the main competitors. Remarkably, a $2\times2$ LCO emerges as the many-body ground state over a significant parameter space with strong second nearest-neighbor repulsion, stemming from the unique interplay between sublattice characters and lattice geometry. The resulting electronic model with LCO bears similarities to the Haldane model and culminates in a quantum anomalous Hall state. We also discuss potential experimental implications for kagome metals.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack
Authors:
Siqi Hui,
Yiren Song,
Sanping Zhou,
Ye Deng,
Wenli Huang,
Jinjun Wang
Abstract:
Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for dire…
▽ More
Autoregressive (AR) image generation models have gained increasing attention for their breakthroughs in synthesis quality, highlighting the need for robust watermarking to prevent misuse. However, existing in-generation watermarking techniques are primarily designed for diffusion models, where watermarks are embedded within diffusion latent states. This design poses significant challenges for direct adaptation to AR models, which generate images sequentially through token prediction. Moreover, diffusion-based regeneration attacks can effectively erase such watermarks by perturbing diffusion latent states. To address these challenges, we propose Lexical Bias Watermarking (LBW), a novel framework designed for AR models that resists regeneration attacks. LBW embeds watermarks directly into token maps by biasing token selection toward a predefined green list during generation. This approach ensures seamless integration with existing AR models and extends naturally to post-hoc watermarking. To increase the security against white-box attacks, instead of using a single green list, the green list for each image is randomly sampled from a pool of green lists. Watermark detection is performed via quantization and statistical analysis of the token distribution. Extensive experiments demonstrate that LBW achieves superior watermark robustness, particularly in resisting regeneration attacks.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Aligning VLM Assistants with Personalized Situated Cognition
Authors:
Yongqi Li,
Shen Zhou,
Xiaohu Li,
Xin Miao,
Jintao Wen,
Mayi Xu,
Jianhao Chen,
Birong Pan,
Hankun Kang,
Yuanyuan Zhu,
Ming Zhong,
Tieyun Qian
Abstract:
Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM…
▽ More
Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM assistants with personalized situated cognition for real-world assistance. To study this problem, we first simplify it by characterizing individuals based on the sociological concept of Role-Set. Then, we propose to evaluate the individuals' actions to examine whether the personalized alignment is achieved. Further, we construct a benchmark named PCogAlignBench, which includes 18k instances and 20 individuals with different Role-Sets. Finally, we present a framework called PCogAlign, which constructs a cognition-aware and action-based reward model for personalized alignment. Experimental results and human evaluations demonstrate the reliability of the PCogAlignBench and the effectiveness of our proposed PCogAlign. We will open-source the constructed benchmark and code at https://github.com/NLPGM/PCogAlign.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Fourier ptychographic microscopy aided with transport of intensity equation for robust full phase spectrum reconstruction
Authors:
Mikołaj Rogalski,
Juan Martinez-Carranza,
Bartosz Górski,
Piotr Arcab,
Michał Jóźwik,
Piotr Zdańkowski,
Magdalena Sobień,
Marzena Stefaniuk,
Shun Zhou,
Chao Zuo,
Maciej Trusiak
Abstract:
Fourier ptychographic microscopy (FPM) is a pivotal computational imaging technique that achieves phase and amplitude reconstruction with high resolution and wide field of view, using low numerical aperture objectives and LED array illumination. Despite its unique strengths, FPM remains fundamentally limited in retrieving low spatial frequency phase information due to the absence of phase encoding…
▽ More
Fourier ptychographic microscopy (FPM) is a pivotal computational imaging technique that achieves phase and amplitude reconstruction with high resolution and wide field of view, using low numerical aperture objectives and LED array illumination. Despite its unique strengths, FPM remains fundamentally limited in retrieving low spatial frequency phase information due to the absence of phase encoding in all brightfield illumination angles. To overcome this, we present a novel hybrid approach that combines FPM with the transport of intensity equation (TIE), enabling accurate, full-spectrum phase retrieval without compromising system simplicity. Our method extends standard FPM acquisitions with a single additional on-axis defocused image, from which low-frequency phase components are reconstructed via TIE method, employing large defocus distance to suppress low-frequency artifacts and enhance robustness to intensity noise. To additionally compensate for defocus-induced magnification variations caused by spherical wavefront illumination, we employ an affine transform-based correction scheme upon image registration. Notably, by restoring the missing low-frequency content, our hybrid method appears capable of recovering phase values beyond the conventional 0-2π range - an area where conventional FPM techniques often struggle when dealing with optically thick samples. We validated our method using a quantitative phase test target for benchmarking accuracy and biological cheek cells, mouse neurons, and mouse brain tissue slice samples to demonstrate applicability for in vitro bioimaging. Experimental results confirm substantial improvements in phase reconstruction fidelity across spatial frequencies, establishing this hybrid FPM+TIE framework as a practical and high-performance solution for quantitative phase imaging in biomedical and optical metrology applications.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
Authors:
Yixuan Wang,
Shiqi Zhou,
Chuanzhe Guo,
Qingfu Zhu
Abstract:
Evol-Instruct has made significant improvements as a data synthesis method in several areas. Existing methods typically rely on a fixed set of strategies to evolve, which require manual design and are monolithic in form. In addition, iterative evolution also makes the acquisition of hard samples expensive. In view of this, we propose the Tag-Evol framework, a more diverse and efficient instruction…
▽ More
Evol-Instruct has made significant improvements as a data synthesis method in several areas. Existing methods typically rely on a fixed set of strategies to evolve, which require manual design and are monolithic in form. In addition, iterative evolution also makes the acquisition of hard samples expensive. In view of this, we propose the Tag-Evol framework, a more diverse and efficient instruction evolving method. Specifically, Tag-Evol uses diverse and specific knowledge tags as strategies to achieve controlled evolution by injecting different combinations of tags into the original instructions. Experiments with multiple backbones in diverse domain benchmarks show that the proposed method generates significantly better evolved data than other methods. Furthermore, we conduct a thorough analysis of the evolved data, demonstrating that Tag-Evol is not only efficient but also generates more diverse and challenging data.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context
Authors:
Hairu Wang,
Yuan Feng,
Yukun Cao,
Xike Xie,
S Kevin Zhou
Abstract:
Large language models excel at many tasks but often incur high inference costs during deployment. To mitigate hallucination, many systems use a knowledge graph to enhance retrieval-augmented generation (KG-RAG). However, the large amount of retrieved knowledge contexts increase these inference costs further. A promising solution to balance performance and cost is LLM routing, which directs simple…
▽ More
Large language models excel at many tasks but often incur high inference costs during deployment. To mitigate hallucination, many systems use a knowledge graph to enhance retrieval-augmented generation (KG-RAG). However, the large amount of retrieved knowledge contexts increase these inference costs further. A promising solution to balance performance and cost is LLM routing, which directs simple queries to smaller LLMs and complex ones to larger LLMs. However, no dedicated routing methods currently exist for RAG, and existing training-based routers face challenges scaling to this domain due to the need for extensive training data. We observe that the score distributions produced by the retrieval scorer strongly correlate with query difficulty. Based on this, we propose a novel, training-free routing framework, the first tailored to KG-RAG that effectively balances performance and cost in a plug-and-play manner. Experiments show our method reduces calls to larger LLMs by up to 50% without sacrificing response quality, demonstrating its potential for efficient and scalable LLM deployment.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Conversational Exploration of Literature Landscape with LitChat
Authors:
Mingyu Huang,
Shasha Zhou,
Yuxuan Chen,
Ke Li
Abstract:
We are living in an era of "big literature", where the volume of digital scientific publications is growing exponentially. While offering new opportunities, this also poses challenges for understanding literature landscapes, as traditional manual reviewing is no longer feasible. Recent large language models (LLMs) have shown strong capabilities for literature comprehension, yet they are incapable…
▽ More
We are living in an era of "big literature", where the volume of digital scientific publications is growing exponentially. While offering new opportunities, this also poses challenges for understanding literature landscapes, as traditional manual reviewing is no longer feasible. Recent large language models (LLMs) have shown strong capabilities for literature comprehension, yet they are incapable of offering "comprehensive, objective, open and transparent" views desired by systematic reviews due to their limited context windows and trust issues like hallucinations. Here we present LitChat, an end-to-end, interactive and conversational literature agent that augments LLM agents with data-driven discovery tools to facilitate literature exploration. LitChat automatically interprets user queries, retrieves relevant sources, constructs knowledge graphs, and employs diverse data-mining techniques to generate evidence-based insights addressing user needs. We illustrate the effectiveness of LitChat via a case study on AI4Health, highlighting its capacity to quickly navigate the users through large-scale literature landscape with data-based evidence that is otherwise infeasible with traditional means.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.