Search | arXiv e-print repository

s3: You Don't Need That Much Data to Train a Search Agent via RL

Authors: Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han

Abstract: Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., ND… ▽ More Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14053 [pdf, ps, other]

doi 10.1145/3715722

On-Demand Scenario Generation for Testing Automated Driving Systems

Authors: Songyang Yan, Xiaodong Zhang, Kunkun Hao, Haojie Xin, Yonggang Luo, Jucheng Yang, Ming Fan, Chao Yang, Jun Sun, Zijiang Yang

Abstract: The safety and reliability of Automated Driving Systems (ADS) are paramount, necessitating rigorous testing methodologies to uncover potential failures before deployment. Traditional testing approaches often prioritize either natural scenario sampling or safety-critical scenario generation, resulting in overly simplistic or unrealistic hazardous tests. In practice, the demand for natural scenarios… ▽ More The safety and reliability of Automated Driving Systems (ADS) are paramount, necessitating rigorous testing methodologies to uncover potential failures before deployment. Traditional testing approaches often prioritize either natural scenario sampling or safety-critical scenario generation, resulting in overly simplistic or unrealistic hazardous tests. In practice, the demand for natural scenarios (e.g., when evaluating the ADS's reliability in real-world conditions), critical scenarios (e.g., when evaluating safety in critical situations), or somewhere in between (e.g., when testing the ADS in regions with less civilized drivers) varies depending on the testing objectives. To address this issue, we propose the On-demand Scenario Generation (OSG) Framework, which generates diverse scenarios with varying risk levels. Achieving the goal of OSG is challenging due to the complexity of quantifying the criticalness and naturalness stemming from intricate vehicle-environment interactions, as well as the need to maintain scenario diversity across various risk levels. OSG learns from real-world traffic datasets and employs a Risk Intensity Regulator to quantitatively control the risk level. It also leverages an improved heuristic search method to ensure scenario diversity. We evaluate OSG on the Carla simulators using various ADSs. We verify OSG's ability to generate scenarios with different risk levels and demonstrate its necessity by comparing accident types across risk levels. With the help of OSG, we are now able to systematically and objectively compare the performance of different ADSs based on different risk levels. △ Less

Submitted 25 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

Comments: 20 pages, 9 figures. Accepted by FSE 2025

arXiv:2505.14038 [pdf, ps, other]

ProMind-LLM: Proactive Mental Health Care via Causal Reasoning with Sensor Data

Authors: Xinzhe Zheng, Sijie Ji, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava

Abstract: Mental health risk is a critical global public health challenge, necessitating innovative and reliable assessment methods. With the development of large language models (LLMs), they stand out to be a promising tool for explainable mental health care applications. Nevertheless, existing approaches predominantly rely on subjective textual mental records, which can be distorted by inherent mental unc… ▽ More Mental health risk is a critical global public health challenge, necessitating innovative and reliable assessment methods. With the development of large language models (LLMs), they stand out to be a promising tool for explainable mental health care applications. Nevertheless, existing approaches predominantly rely on subjective textual mental records, which can be distorted by inherent mental uncertainties, leading to inconsistent and unreliable predictions. To address these limitations, this paper introduces ProMind-LLM. We investigate an innovative approach integrating objective behavior data as complementary information alongside subjective mental records for robust mental health risk assessment. Specifically, ProMind-LLM incorporates a comprehensive pipeline that includes domain-specific pretraining to tailor the LLM for mental health contexts, a self-refine mechanism to optimize the processing of numerical behavioral data, and causal chain-of-thought reasoning to enhance the reliability and interpretability of its predictions. Evaluations of two real-world datasets, PMData and Globem, demonstrate the effectiveness of our proposed methods, achieving substantial improvements over general LLMs. We anticipate that ProMind-LLM will pave the way for more dependable, interpretable, and scalable mental health case solutions. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.14015 [pdf, ps, other]

AUTOLAW: Enhancing Legal Compliance in Large Language Models via Case Law Generation and Jury-Inspired Deliberation

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: The rapid advancement of domain-specific large language models (LLMs) in fields like law necessitates frameworks that account for nuanced regional legal distinctions, which are critical for ensuring compliance and trustworthiness. Existing legal evaluation benchmarks often lack adaptability and fail to address diverse local contexts, limiting their utility in dynamically evolving regulatory landsc… ▽ More The rapid advancement of domain-specific large language models (LLMs) in fields like law necessitates frameworks that account for nuanced regional legal distinctions, which are critical for ensuring compliance and trustworthiness. Existing legal evaluation benchmarks often lack adaptability and fail to address diverse local contexts, limiting their utility in dynamically evolving regulatory landscapes. To address these gaps, we propose AutoLaw, a novel violation detection framework that combines adversarial data generation with a jury-inspired deliberation process to enhance legal compliance of LLMs. Unlike static approaches, AutoLaw dynamically synthesizes case law to reflect local regulations and employs a pool of LLM-based "jurors" to simulate judicial decision-making. Jurors are ranked and selected based on synthesized legal expertise, enabling a deliberation process that minimizes bias and improves detection accuracy. Evaluations across three benchmarks: Law-SG, Case-SG (legality), and Unfair-TOS (policy), demonstrate AutoLaw's effectiveness: adversarial data generation improves LLM discrimination, while the jury-based voting strategy significantly boosts violation detection rates. Our results highlight the framework's ability to adaptively probe legal misalignments and deliver reliable, context-aware judgments, offering a scalable solution for evaluating and enhancing LLMs in legally sensitive applications. △ Less

Submitted 19 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.13904 [pdf, ps, other]

Learning to Insert for Constructive Neural Vehicle Routing Solver

Authors: Fu Luo, Xi Lin, Mengyuan Zhong, Fei Liu, Zhenkun Wang, Jianyong Sun, Qingfu Zhang

Abstract: Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the… ▽ More Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm's flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes. △ Less

Submitted 23 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.13413 [pdf, ps, other]

Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling

Authors: Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang, Xiang Gu, Peijie Zhou, Jian Sun

Abstract: Learning the underlying dynamics of single cells from snapshot data has gained increasing attention in scientific and machine learning research. The destructive measurement technique and cell proliferation/death result in unpaired and unbalanced data between snapshots, making the learning of the underlying dynamics challenging. In this paper, we propose joint Velocity-Growth Flow Matching (VGFM),… ▽ More Learning the underlying dynamics of single cells from snapshot data has gained increasing attention in scientific and machine learning research. The destructive measurement technique and cell proliferation/death result in unpaired and unbalanced data between snapshots, making the learning of the underlying dynamics challenging. In this paper, we propose joint Velocity-Growth Flow Matching (VGFM), a novel paradigm that jointly learns state transition and mass growth of single-cell populations via flow matching. VGFM builds an ideal single-cell dynamics containing velocity of state and growth of mass, driven by a presented two-period dynamic understanding of the static semi-relaxed optimal transport, a mathematical tool that seeks the coupling between unpaired and unbalanced data. To enable practical usage, we approximate the ideal dynamics using neural networks, forming our joint velocity and growth matching framework. A distribution fitting loss is also employed in VGFM to further improve the fitting performance for snapshot data. Extensive experimental results on both synthetic and real datasets demonstrate that VGFM can capture the underlying biological dynamics accounting for mass and state variations over time, outperforming existing approaches for single-cell dynamics modeling. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.13222 [pdf, ps, other]

Partial Wave Analysis of $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$ and Cross Section Measurement of $e^{+}e^{-} \rightarrow π^{\pm}Z_{c}(3900)^{\mp}$ from 4.1271 to 4.3583 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$,… ▽ More Based on 12.0 $\mathrm{fb^{-1}}$ of $e^{+}e^{-}$ collision data samples collected by the BESIII detector at center-of-mass energies from 4.1271 to 4.3583 GeV, a partial wave analysis is performed for the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. The cross sections for the sub processes ${e^{+}e^{-}\rightarrowπ^{+}Z_{c}(3900)^{-}+c.c.\rightarrowπ^{+}π^{-}J/ψ}$, $f_{0}(980)(\rightarrowπ^{+}π^{-})J/ψ$, and $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ are measured for the first time. The mass and width of the $Z_{c}(3900)^{\pm}$ are determined to be $3884.6\pm0.7\pm3.3$ MeV/$c^{2}$ and $37.2\pm1.3\pm6.6$ MeV, respectively. The first errors are statistical and the second systematic. The final state $(π^{+}π^{-})_{\rm{S\mbox{-}wave}} J/ψ$ dominates the process $e^{+}e^{-} \rightarrow π^{+}π^{-}J/ψ$. By analyzing the cross sections of $π^{\pm}Z_{c}(3900)^{\mp}$ and $f_{0}(980)J/ψ$, $Y(4220)$ has been observed. Its mass and width are determined to be $4225.8\pm4.2\pm3.1$ MeV/$c^{2}$ and $55.3\pm9.5\pm11.1$ MeV, respectively. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12844 [pdf, ps, other]

AGI-Elo: How Far Are We From Mastering A Task?

Authors: Shuo Sun, Yimin Zhao, Christina Dao Wen Lee, Jiawei Sun, Chengran Yuan, Zefan Huang, Dongen Li, Justin KW Yeoh, Alok Prakash, Thomas W. Malone, Marcelo H. Ang Jr

Abstract: As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unli… ▽ More As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating distributions offer novel perspectives and interpretable insights into task difficulty, model progression, and the outstanding challenges that remain on the path to achieving full AGI task mastery. △ Less

Submitted 24 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

arXiv:2505.12234 [pdf, other]

Observation of $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (678 additional authors not shown)

Abstract: Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are deter… ▽ More Using $(2712.4\pm14.3)\times10^6$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII storage ring, the decays $χ_{cJ}(J=0,1,2)\rightarrow p\bar{p}ηη$ are observed for the first time through the radiative transition $ψ(3686)\toγχ_{cJ}$. The statistical significances for $χ_{cJ}$ signals are all larger than 5$σ$. The branching fractions of $χ_{c0,1,2}\to p\bar{p} ηη$ are determined to be $({5.75 \pm 0.59 \pm 0.42}) \times 10^{-5}$, $({1.40 \pm 0.33 \pm 0.17}) \times 10^{-5}$, and $({2.64 \pm 0.40 \pm 0.27}) \times 10^{-5}$, respectively, where the first uncertainties are statistical and the second systematic. No evident resonant structures are found in the $p\bar{p}$ and $pη/\bar{p}η$ systems. △ Less

Submitted 18 May, 2025; originally announced May 2025.

Comments: 17 pages, 16 figures

arXiv:2505.12086 [pdf, ps, other]

Observation of an Altered $a_{0}(980)$ Line-shape in $D^{+} \rightarrow π^{+}ηη$ due to the Triangle Loop Rescattering Effect

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (705 additional authors not shown)

Abstract: Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be… ▽ More Using 20.3~${\rm fb}^{-1}$ of $e^{+}e^{-}$ collision data taken with the BESIII detector at the center-of-mass energy 3.773~GeV, we report the first amplitude analysis of the hadronic decay $D^{+} \rightarrow π^{+}ηη$. The intermediate process $D^{+} \to a_{0}(980)^{+}η, a_{0}(980)^{+} \to π^{+}η$ is observed and is found to be the only component and its branching fraction is measured to be $(3.67\pm0.12_{\mathrm{stat.}}\pm 0.06_{\mathrm{syst.}})\times 10^{-3}$. Unlike the $a_{0}(980)$ line-shape observed in the decays of charmed mesons to $a_{0}(980)π$ and in the decay $D^{0} \to a_{0}(980)^{-}e^{+}ν_{e}$, where the low-mass side of the $a_0(980)$ is wider than the high-mass side, the $a_{0}(980)$ line-shape in $D^{+} \to a_{0}(980)^{+}η$ is found to be significantly altered, with the high-mass side being wider than the low-mass side. We establish that the $a_0(980)$ line-shape arises from the triangle loop rescattering of $D^+ \to \bar{K}_0^*(1430)^0K^+ \to a_0(980)^+ η$ and $D^+ \to K_0^*(1430)^+\bar{K}^0 \to a_0(980)^+ η$ with a significance of 5.8$σ$. This is the first experimental confirmation of the triangle loop rescattering effect. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2505.11955 [pdf, other]

First measurement of $b$-jet mass with and without grooming

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1133 additional authors not shown)

Abstract: The LHCb collaboration presents a novel suite of heavy-flavour jet substructure measurements at forward rapidity in proton-proton collisions at a centre-of-mass energy of $\sqrt{s} = 13$ TeV. The jet mass is a perturbatively calculable probe of the virtuality of hard-scattered quarks and gluons, connecting small-distance quantum chromodynamics (QCD) with long-distance experimental measurement. It… ▽ More The LHCb collaboration presents a novel suite of heavy-flavour jet substructure measurements at forward rapidity in proton-proton collisions at a centre-of-mass energy of $\sqrt{s} = 13$ TeV. The jet mass is a perturbatively calculable probe of the virtuality of hard-scattered quarks and gluons, connecting small-distance quantum chromodynamics (QCD) with long-distance experimental measurement. It becomes dominated by nonperturbative corrections at small values, presenting an excellent test of QCD across a broad range of energies. Measuring heavy-flavour jet mass with a theoretically unambiguous flavour definition for the first time probes the gluon splitting mechanism for heavy-flavour production and pushes tests of perturbative QCD to unprecedented theoretical precision. Utilising the soft drop jet-grooming technique to access the perturbative jet core further enhances constraints on first-principles theory. Measurements of the jet mass for jets containing fully reconstructed $B^\pm$ hadrons are reported with and without grooming. These results offer unparalleled tests of quark flavour and mass dependence in QCD and provide a baseline for future studies of heavy-flavour jet quenching in heavy-ion collisions. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/4555/ (LHCb public pages)

Report number: LHCb-PAPER-2025-009, CERN-EP-2025-097

arXiv:2505.10908 [pdf, other]

The impact of spiral arms on the star formation life cycle

Authors: Andrea Romanelli, Mélanie Chevance, J. M. Diederik Kruijssen, Lise Ramambason, Miguel Querejeta, Mederic Boquien, Daniel A. Dale, Jakob den Brok, Simon C. O. Glover, Kathryn Grasha, Annie Hughes, Jaeyeon Kim, Steven Longmore, Sharon E. Meidt, José Eduardo Mendez-Delgado, Lukas Neumann, Jérôme Pety, Eva Schinnerer, Rowan Smith, Jiayi Sun, Thomas G. Williams

Abstract: The matter cycle between gas clouds and stars in galaxies plays a crucial role in regulating galaxy evolution through feedback mechanisms. In turn, the local and global galactic environments shape the interstellar medium and provide the initial conditions for star formation, potentially affecting the properties of this small-scale matter cycle. In particular, spiral arms have been proposed to play… ▽ More The matter cycle between gas clouds and stars in galaxies plays a crucial role in regulating galaxy evolution through feedback mechanisms. In turn, the local and global galactic environments shape the interstellar medium and provide the initial conditions for star formation, potentially affecting the properties of this small-scale matter cycle. In particular, spiral arms have been proposed to play a pivotal role in the star formation life cycle, by enhancing the gas density and triggering star formation. However, their exact role is still debated. In this paper, we investigate the role of spiral arms in the giant molecular cloud evolutionary life cycle and on the star formation process in a sample of 22 nearby spiral galaxies from the PHANGS survey. We measure the cloud lifetime, the feedback timescale, the typical distance between independent regions and the star formation efficiency in spiral arms and inter-arm regions separately. We find that the distributions of the cloud lifetime as well as the feedback timescale are similar in both environments. This result suggests that spiral arms are unlikely to play a dominant role in triggering star formation. By contrast, the star formation efficiency appears to be slightly higher in inter-arm regions compared to spiral arms. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: 12 pages, 5 figures

arXiv:2505.10895 [pdf, ps, other]

Digital quantum simulation of squeezed states via enhanced bosonic encoding in a superconducting quantum processor

Authors: Hengyue Li, Yusheng Yang, Zhe-Hui Wang, Shuxin Xie, Zilong Zha, Hantao Sun, Jie Chen, Jian Sun, Shenggang Ying

Abstract: We present a fully digital approach for simulating single-mode squeezed states on a superconducting quantum processor using an enhanced bosonic encoding strategy. By mapping up to 2^{n} photonic Fock states onto n qubits, our framework leverages Gray-code-based encodings to reduce gate overhead compared to conventional one-hot or binary mappings. We further optimize resource usage by restricting t… ▽ More We present a fully digital approach for simulating single-mode squeezed states on a superconducting quantum processor using an enhanced bosonic encoding strategy. By mapping up to 2^{n} photonic Fock states onto n qubits, our framework leverages Gray-code-based encodings to reduce gate overhead compared to conventional one-hot or binary mappings. We further optimize resource usage by restricting the simulation on Fock states with even number of photons only, effectively doubling the range of photon numbers that can be represented for a given number of qubits. To overcome noise and finite coherence in current hardware, we employ a variational quantum simulation protocol, which adapts shallow, parameterized circuits through iterative optimization. Implemented on the Zuchongzhi-2 superconducting platform, our method demonstrates squeezed-state dynamics across a parameter sweep from vacuum state preparation (r=0) to squeezing levels exceeding the Fock space truncation limit (r>1.63). Experimental results, corroborated by quantum state tomography and Wigner-function analysis, confirm high-fidelity state preparation and demonstrate the potential of Gray-code-inspired techniques for realizing continuous-variable physics on near-term, qubit-based quantum processors. △ Less

Submitted 11 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

arXiv:2505.10780 [pdf, ps, other]

SECRET: Semi-supervised Clinical Trial Document Similarity Search

Authors: Trisha Das, Afrah Shafquat, Beigi Mandis, Jacob Aptekar, Jimeng Sun

Abstract: Clinical trials are vital for evaluation of safety and efficacy of new treatments. However, clinical trials are resource-intensive, time-consuming and expensive to conduct, where errors in trial design, reduced efficacy, and safety events can result in significant delays, financial losses, and damage to reputation. These risks underline the importance of informed and strategic decisions in trial d… ▽ More Clinical trials are vital for evaluation of safety and efficacy of new treatments. However, clinical trials are resource-intensive, time-consuming and expensive to conduct, where errors in trial design, reduced efficacy, and safety events can result in significant delays, financial losses, and damage to reputation. These risks underline the importance of informed and strategic decisions in trial design to mitigate these risks and improve the chances of a successful trial. Identifying similar historical trials is critical as these trials can provide an important reference for potential pitfalls and challenges including serious adverse events, dosage inaccuracies, recruitment difficulties, patient adherence issues, etc. Addressing these challenges in trial design can lead to development of more effective study protocols with optimized patient safety and trial efficiency. In this paper, we present a novel method to identify similar historical trials by summarizing clinical trial protocols and searching for similar trials based on a query trial's protocol. Our approach significantly outperforms all baselines, achieving up to a 78% improvement in recall@1 and a 53% improvement in precision@1 over the best baseline. We also show that our method outperforms all other baselines in partial trial similarity search and zero-shot patient-trial matching, highlighting its superior utility in these tasks. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.10475 [pdf, ps, other]

Parallel Scaling Law for Language Models

Authors: Mouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Jianling Sun, Junyang Lin, Zhongxin Liu

Abstract: It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply $P$ diverse and learnable transformations t… ▽ More It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply $P$ diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the $P$ outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with $P$ parallel streams is similar to scaling the parameters by $O(\log P)$ while showing superior inference efficiency. For example, ParScale can use up to 22$\times$ less memory increase and 6$\times$ less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget. The new scaling law we discovered potentially facilitates the deployment of more powerful models in low-resource scenarios, and provides an alternative perspective for the role of computation in machine learning. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.10419 [pdf, other]

Analog Self-Interference Cancellation in Full-Duplex Radios: A Fundamental Limit Perspective

Authors: Limin Liao, Jun Sun, Junzhi Wang, Yingzhuang Liu

Abstract: Analog self-interference cancellation (A-SIC) plays a crucial role in the implementation of in-band full-duplex (IBFD) radios, due to the fact that the inherent transmit (Tx) noise can only be addressed in the analog domain. It is thus natural to ask what the performance limit of A-SIC is in practical systems, which is still quite underexplored so far. In this paper, we aim to close this gap by ch… ▽ More Analog self-interference cancellation (A-SIC) plays a crucial role in the implementation of in-band full-duplex (IBFD) radios, due to the fact that the inherent transmit (Tx) noise can only be addressed in the analog domain. It is thus natural to ask what the performance limit of A-SIC is in practical systems, which is still quite underexplored so far. In this paper, we aim to close this gap by characterizing the fundamental performance of A-SIC which employs the common multi-tap delay (MTD) architecture, by accounting for the following practical issues: 1) Nonstationarity of the Tx signal; 2) Nonlinear distortions on the Tx signal; 3) Multipath channel corresponding to the self-interference (SI); 4) Maximum amplitude constraint on the MTD tap weights. Our findings include: 1) The average approximation error for the cyclostationary Tx signals is equal to that for the stationary white Gaussian process, thus greatly simplifying the performance analysis and the optimization procedure. 2) The approximation error for the multipath SI channel can be decomposed as the sum of the approximation error for the single-path scenario. By leveraging these structural results, the optimization framework and algorithms which characterize the fundamental limit of A-SIC, by taking into account all the aforementioned practical factors, are provided. △ Less

Submitted 15 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.10340 [pdf, ps, other]

Non-Markovian dynamics with a driven three-level giant atom in a semi-infinite photonic waveguide

Authors: S. J. Sun, Z. Y. Li, C. Cui, Shuang Xu, H. Z. Shen

Abstract: The non-Markovian effects of open quantum systems subjected to external environments are deemed to be valuable resources in quantum optics and quantum information processing. In this work, we investigate the non-Markovian dynamics of a three-level giant atom coupling with a semi-infinite photonic waveguide through multiple coupling points and driven by a classical driving field. We derive the anal… ▽ More The non-Markovian effects of open quantum systems subjected to external environments are deemed to be valuable resources in quantum optics and quantum information processing. In this work, we investigate the non-Markovian dynamics of a three-level giant atom coupling with a semi-infinite photonic waveguide through multiple coupling points and driven by a classical driving field. We derive the analytical expressions for the probability amplitudes of the driven three-level giant atom and obtain two independent conditions. We find two different types of bound states (including the static bound states and the periodic equal-amplitude oscillating bound states) and discuss the physical origins of the bound states formation. Moreover, we discuss the case of the driven three-level giant atom interacting with the infinite photonic waveguide, where there is only one purely imaginary solution (i.e., only one bound state condition exists) for its complex frequency (coming from the absence of mirror at one end of the waveguide) compared to that of a driven three-level giant atom coupling with a semi-infinite photonic waveguide. With this, we also find two different types of bound states, including the static bound state and the periodic equal-amplitude oscillating bound states. Finally, the above results are generalized to a more general model involving a semi-infinite photonic waveguide coupling with an arbitrary number of noninteracting three-level giant atoms driven by the driving fields. The proposed protocol could provide a pathway to precisely elucidate the non-Markovian dynamics of driven, multi-level giant atoms coupled to semi-infinite or infinite photonic waveguides. △ Less

Submitted 15 May, 2025; originally announced May 2025.

Comments: 23 pages, 15 figures

arXiv:2505.09424 [pdf, ps, other]

Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion

Authors: Han Sun, Yizhao Wang, Zhenning Zhou, Shuai Wang, Haibo Yang, Jingyuan Sun, Qixin Cao

Abstract: Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic… ▽ More Recent studies have proved that imitation learning shows strong potential in the field of robotic manipulation. However, existing methods still struggle with precision manipulation task and rely on inefficient image/point cloud observations. In this paper, we explore to introduce SE(3) object pose into imitation learning and propose the pose-guided efficient imitation learning methods for robotic precise insertion task. First, we propose a precise insertion diffusion policy which utilizes the relative SE(3) pose as the observation-action pair. The policy models the source object SE(3) pose trajectory relative to the target object. Second, we explore to introduce the RGBD data to the pose-guided diffusion policy. Specifically, we design a goal-conditioned RGBD encoder to capture the discrepancy between the current state and the goal state. In addition, a pose-guided residual gated fusion method is proposed, which takes pose features as the backbone, and the RGBD features selectively compensate for pose feature deficiencies through an adaptive gating mechanism. Our methods are evaluated on 6 robotic precise insertion tasks, demonstrating competitive performance with only 7-10 demonstrations. Experiments demonstrate that the proposed methods can successfully complete precision insertion tasks with a clearance of about 0.01 mm. Experimental results highlight its superior efficiency and generalization capability compared to existing baselines. Code will be available at https://github.com/sunhan1997/PoseInsert. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.09273 [pdf, ps, other]

Rapidity and multiplicity dependence of charged-particle flow in $p$Pb collisions at $\sqrt{s_{NN}} = 8.16$ TeV

Authors: R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An, L. Anderlini , et al. (1126 additional authors not shown)

Abstract: The elliptic and triangular flow of charged particles are measured using two-particle angular correlations in $p$Pb collisions in the pseudorapidity range \cal{2.0 $< |η| <$ 4.8}. The data sample was collected by the LHCb experiment in 2016 at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}} = 8.16$ TeV, containing in total approximately 1.5 billion collision events. Non-flow contributio… ▽ More The elliptic and triangular flow of charged particles are measured using two-particle angular correlations in $p$Pb collisions in the pseudorapidity range \cal{2.0 $< |η| <$ 4.8}. The data sample was collected by the LHCb experiment in 2016 at a centre-of-mass energy per nucleon pair of $\sqrt{s_{NN}} = 8.16$ TeV, containing in total approximately 1.5 billion collision events. Non-flow contributions are obtained in low-multiplicity collisions and subtracted to extract the flow harmonics. The results are presented as a function of event multiplicity and hadron transverse momentum. Comparisons with a full (3+1)D dynamic model indicate that it overestimates the measured elliptic flow. A comparison between the forward and backward regions reveals no significant differences in flow parameters, suggesting that final-state effects may dominate over initial-state effects in the origin of flow in small systems. △ Less

Submitted 14 May, 2025; originally announced May 2025.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2025-003.html (LHCb public pages)

Report number: CERN-EP-2025-090, LHCb-PAPER-2025-003

arXiv:2505.09255 [pdf, ps, other]

Data-driven Internal Model Control for Output Regulation

Authors: Wenjie Liu, Yifei Li, Jian Sun, Gang Wang, Keyou You, Lihua Xie, Jie Chen

Abstract: Output regulation is a fundamental problem in control theory, extensively studied since the 1970s. Traditionally, research has primarily addressed scenarios where the system model is explicitly known, leaving the problem in the absence of a system model less explored. Leveraging the recent advancements in Willems et al.'s fundamental lemma, data-driven control has emerged as a powerful tool for st… ▽ More Output regulation is a fundamental problem in control theory, extensively studied since the 1970s. Traditionally, research has primarily addressed scenarios where the system model is explicitly known, leaving the problem in the absence of a system model less explored. Leveraging the recent advancements in Willems et al.'s fundamental lemma, data-driven control has emerged as a powerful tool for stabilizing unknown systems. This paper tackles the output regulation problem for unknown single and multi-agent systems (MASs) using noisy data. Previous approaches have attempted to solve data-based output regulation equations (OREs), which are inadequate for achieving zero tracking error with noisy data. To circumvent the need for solving data-based OREs, we propose an internal model-based data-driven controller that reformulates the output regulation problem into a stabilization problem. This method is first applied to linear time-invariant (LTI) systems, demonstrating exact solution capabilities, i.e., zero tracking error, through solving a straightforward data-based linear matrix inequality (LMI). Furthermore, we extend our approach to solve the $k$th-order output regulation problem for nonlinear systems. Extensions to both linear and nonlinear MASs are discussed. Finally, numerical tests validate the effectiveness and correctness of the proposed controllers. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.09168 [pdf, other]

DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection

Authors: Jianlin Sun, Xiaolin Fang, Juwei Guan, Dongdong Gui, Teqi Wang, Tongxin Zhu

Abstract: The core challenge in Camouflage Object Detection (COD) lies in the indistinguishable similarity between targets and backgrounds in terms of color, texture, and shape. This causes existing methods to either lose edge details (such as hair-like fine structures) due to over-reliance on global semantic information or be disturbed by similar backgrounds (such as vegetation patterns) when relying solel… ▽ More The core challenge in Camouflage Object Detection (COD) lies in the indistinguishable similarity between targets and backgrounds in terms of color, texture, and shape. This causes existing methods to either lose edge details (such as hair-like fine structures) due to over-reliance on global semantic information or be disturbed by similar backgrounds (such as vegetation patterns) when relying solely on local features. We propose DRRNet, a four-stage architecture characterized by a "context-detail-fusion-refinement" pipeline to address these issues. Specifically, we introduce an Omni-Context Feature Extraction Module to capture global camouflage patterns and a Local Detail Extraction Module to supplement microstructural information for the full-scene context module. We then design a module for forming dual representations of scene understanding and structural awareness, which fuses panoramic features and local features across various scales. In the decoder, we also introduce a reverse refinement module that leverages spatial edge priors and frequency-domain noise suppression to perform a two-stage inverse refinement of the output. By applying two successive rounds of inverse refinement, the model effectively suppresses background interference and enhances the continuity of object boundaries. Experimental results demonstrate that DRRNet significantly outperforms state-of-the-art methods on benchmark datasets. Our code is available at https://github.com/jerrySunning/DRRNet. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.08744 [pdf, other]

DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of Large Language Models

Authors: Xiaoyang Chen, Xinan Dai, Yu Du, Qian Feng, Naixu Guo, Tingshuo Gu, Yuting Gao, Yingyi Gao, Xudong Han, Xiang Jiang, Yilin Jin, Hongyi Lin, Shisheng Lin, Xiangnan Li, Yuante Li, Yixing Li, Zhentao Lai, Zilu Ma, Yingrong Peng, Jiacheng Qian, Hao-Yu Sun, Jianbo Sun, Zirui Wang, Siwei Wu, Zian Wang , et al. (6 additional authors not shown)

Abstract: To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as e… ▽ More To advance the mathematical proficiency of large language models (LLMs), the DeepMath team has launched an open-source initiative aimed at developing an open mathematical LLM and systematically evaluating its mathematical creativity. This paper represents the initial contribution of this initiative. While recent developments in mathematical LLMs have predominantly emphasized reasoning skills, as evidenced by benchmarks on elementary to undergraduate-level mathematical tasks, the creative capabilities of these models have received comparatively little attention, and evaluation datasets remain scarce. To address this gap, we propose an evaluation criteria for mathematical creativity and introduce DeepMath-Creative, a novel, high-quality benchmark comprising constructive problems across algebra, geometry, analysis, and other domains. We conduct a systematic evaluation of mainstream LLMs' creative problem-solving abilities using this dataset. Experimental results show that even under lenient scoring criteria -- emphasizing core solution components and disregarding minor inaccuracies, such as small logical gaps, incomplete justifications, or redundant explanations -- the best-performing model, O3 Mini, achieves merely 70% accuracy, primarily on basic undergraduate-level constructive tasks. Performance declines sharply on more complex problems, with models failing to provide substantive strategies for open problems. These findings suggest that, although current LLMs display a degree of constructive proficiency on familiar and lower-difficulty problems, such performance is likely attributable to the recombination of memorized patterns rather than authentic creative insight or novel synthesis. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: 14 pages, 4 figures

arXiv:2505.08164 [pdf]

Sliding and superlubric moiré twisting ferroelectric transition in HfO2

Authors: Jie Sun, Xin Li, Tianlin Li, Yu Yun, Guodong Ren, Yiheng Shen, Tengfei Cao, Li-Min Liu

Abstract: Despite progress in HfO2 thin-film ferroelectrics, issues such as fatigue and high coercive fields persist, and the dynamics of emerging twisted ferroelectricity remain largely unexplored. Here, we explore how interlayer sliding and twisting in bilayer HfO2 enables low barrier switching pathways. Among 144 sliding configurations, two exhibit strong in-plane polarization (2360 pC/m) with a low swit… ▽ More Despite progress in HfO2 thin-film ferroelectrics, issues such as fatigue and high coercive fields persist, and the dynamics of emerging twisted ferroelectricity remain largely unexplored. Here, we explore how interlayer sliding and twisting in bilayer HfO2 enables low barrier switching pathways. Among 144 sliding configurations, two exhibit strong in-plane polarization (2360 pC/m) with a low switching barrier of 3.19 meV/atom. Twisting generates polar textures associated with moiré patterns and quasi-flat bands, which drive ferroelectricity via a soft zone-center optical mode, as revealed by machine-learning-assisted first-principles calculations. At twist angles of 21.79° and 27.80°, switching barriers drop to 0.58 and 0.06 meV/atom, indicating superlubric-like ferroelectric transitions. Notably, the 46.83° twisted bilayer shows an almost barrier-free polar evolution (0.009 meV/atom), attributed to sharply enhanced zone-center phonon linewidths. Our findings establish a moiré-engineered, ultra-low-energy switching route for 2D ferroelectric applications. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07834 [pdf, other]

ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet

Authors: Yuekang Li, Wei Song, Bangshuo Zhu, Dong Gong, Yi Liu, Gelei Deng, Chunyang Chen, Lei Ma, Jun Sun, Toby Walsh, Jingling Xue

Abstract: We introduce ai.txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots.txt standard. As AI increasingly engages with online materials for tasks such as training, summarization, and content modification, existing regulatory methods lack the necessary granularity… ▽ More We introduce ai.txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots.txt standard. As AI increasingly engages with online materials for tasks such as training, summarization, and content modification, existing regulatory methods lack the necessary granularity and semantic expressiveness to ensure ethical and legal compliance. ai.txt extends traditional URL-based access controls by enabling precise element-level regulations and incorporating natural language instructions interpretable by AI systems. To facilitate practical deployment, we provide an integrated development environment with code autocompletion and automatic XML generation. Furthermore, we propose two compliance mechanisms: XML-based programmatic enforcement and natural language prompt integration, and demonstrate their effectiveness through preliminary experiments and case studies. Our approach aims to aid the governance of AI-Internet interactions, promoting responsible AI use in digital ecosystems. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2505.07233 [pdf, ps, other]

DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation

Authors: Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han

Abstract: Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance… ▽ More Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance generation quality and explainability. However, it is challenging to determine the appropriate number of documents ($k$) that the reranker should select: too few may result in missing critical information, while too many introduce noise and inefficiencies. Although recent studies have explored LLM-based rerankers, they primarily leverage internal model knowledge and overlook the rich supervisory signals that LLMs can provide, such as using response quality as feedback for optimizing reranking decisions. In this paper, we propose DynamicRAG, a novel RAG framework where the reranker dynamically adjusts both the order and number of retrieved documents based on the query. We model the reranker as an agent optimized through reinforcement learning (RL), using rewards derived from LLM output quality. Across seven knowledge-intensive datasets, DynamicRAG demonstrates superior performance, achieving state-of-the-art results among models of same parameter sizes. The model, data and code are available at https://github.com/GasolSun36/DynamicRAG. △ Less

Submitted 15 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

Comments: 24 pages, 7 figures, 15 tables

arXiv:2505.07062 [pdf, ps, other]

Seed1.5-VL Technical Report

Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428) △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.06875 [pdf, ps, other]

Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning

Authors: Chengkai Xu, Jiaqi Liu, Yicheng Guo, Yuhang Zhang, Peng Hang, Jian Sun

Abstract: Autonomous driving has made significant strides through data-driven techniques, achieving robust performance in standardized tasks. However, existing methods frequently overlook user-specific preferences, offering limited scope for interaction and adaptation with users. To address these challenges, we propose a "fast-slow" decision-making framework that integrates a Large Language Model (LLM) for… ▽ More Autonomous driving has made significant strides through data-driven techniques, achieving robust performance in standardized tasks. However, existing methods frequently overlook user-specific preferences, offering limited scope for interaction and adaptation with users. To address these challenges, we propose a "fast-slow" decision-making framework that integrates a Large Language Model (LLM) for high-level instruction parsing with a Reinforcement Learning (RL) agent for low-level real-time decision. In this dual system, the LLM operates as the "slow" module, translating user directives into structured guidance, while the RL agent functions as the "fast" module, making time-critical maneuvers under stringent latency constraints. By decoupling high-level decision making from rapid control, our framework enables personalized user-centric operation while maintaining robust safety margins. Experimental evaluations across various driving scenarios demonstrate the effectiveness of our method. Compared to baseline algorithms, the proposed architecture not only reduces collision rates but also aligns driving behaviors more closely with user preferences, thereby achieving a human-centric mode. By integrating user guidance at the decision level and refining it with real-time control, our framework bridges the gap between individual passenger needs and the rigor required for safe, reliable driving in complex traffic environments. △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.06254 [pdf, ps, other]

OpenSky Report 2025: Improving Crowdsourced Flight Trajectories with ADS-C Data

Authors: Junzi Sun, Xavier Olive, Martin Strohmeier, Vincent Lenders

Abstract: The OpenSky Network has been collecting and providing crowdsourced air traffic surveillance data since 2013. The network has primarily focused on Automatic Dependent Surveillance--Broadcast (ADS-B) data, which provides high-frequency position updates over terrestrial areas. However, the ADS-B signals are limited over oceans and remote regions, where ground-based receivers are scarce. To address th… ▽ More The OpenSky Network has been collecting and providing crowdsourced air traffic surveillance data since 2013. The network has primarily focused on Automatic Dependent Surveillance--Broadcast (ADS-B) data, which provides high-frequency position updates over terrestrial areas. However, the ADS-B signals are limited over oceans and remote regions, where ground-based receivers are scarce. To address these coverage gaps, the OpenSky Network has begun incorporating data from the Automatic Dependent Surveillance--Contract (ADS-C) system, which uses satellite communication to track aircraft positions over oceanic regions and remote areas. In this paper, we analyze a dataset of over 720,000 ADS-C messages collected in 2024 from around 2,600 unique aircraft via the Alphasat satellite, covering Europe, Africa, and parts of the Atlantic Ocean. We present our approach to combining ADS-B and ADS-C data to construct detailed long-haul flight paths, particularly for transatlantic and African routes. Our findings demonstrate that this integration significantly improves trajectory reconstruction accuracy, allowing for better fuel consumption and emissions estimates. We illustrate how combined data captures flight patterns across previously underrepresented regions across Africa. Despite coverage limitations, this work marks an important advancement in providing open access to global flight trajectory data, enabling new research opportunities in air traffic management, environmental impact assessment, and aviation safety. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2505.05954 [pdf, ps, other]

Demonstration of Direct-amplification Enabled Harmonic Generation in an Ultraviolet Free-Electron Laser

Authors: Hao Sun, Jitao Sun, Li Zeng, Yifan Liang, Lingjun Tu, Huaiqian Yi, Qinming Li, Xiaofan Wang, Yong Yu, Jiayue Yang, Zhigang He, Yuhuan Tian, Likai Wang, Zequn Wang, Guorong Wu, Weiqing Zhang, Xueming Yang

Abstract: We report the experimental demonstration of direct-amplification enabled harmonic generation in an ultraviolet free-electron laser (FEL) driven by a low-intensity seed laser. By employing a versatile undulator configuration that enables seed amplification and harmonic generation within a unified setup, we achieved over 100-fold energy gain of the seed and observed exponential growth at the second… ▽ More We report the experimental demonstration of direct-amplification enabled harmonic generation in an ultraviolet free-electron laser (FEL) driven by a low-intensity seed laser. By employing a versatile undulator configuration that enables seed amplification and harmonic generation within a unified setup, we achieved over 100-fold energy gain of the seed and observed exponential growth at the second harmonic. The results demonstrate that a sufficiently long modulator can not only amplify a weak seed but also induce strong energy modulation of the electron beam, enabling efficient harmonic bunching. This method markedly relaxes the power requirements on external seed lasers and presents a viable route toward high-repetition-rate, fully coherent FELs △ Less

Submitted 9 May, 2025; originally announced May 2025.

arXiv:2505.05888 [pdf, ps, other]

Measurement of the phase between strong and electromagnetic amplitudes in the decay $J/ψ\toφη$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (647 additional authors not shown)

Abstract: The first direct measurement of the relative phase between the strong and electromagnetic amplitudes for a $J/ψ$ decaying into a vector-pseudoscalar final state is performed using 26 energy points of $e^+e^-$ annihilation data between $3.00\ \text{GeV}$ and \mbox{3.12 GeV}. The data sets were collected by the BESIII detector with a total integrated luminosity of 452 pb$^{-1}$. By investigating the… ▽ More The first direct measurement of the relative phase between the strong and electromagnetic amplitudes for a $J/ψ$ decaying into a vector-pseudoscalar final state is performed using 26 energy points of $e^+e^-$ annihilation data between $3.00\ \text{GeV}$ and \mbox{3.12 GeV}. The data sets were collected by the BESIII detector with a total integrated luminosity of 452 pb$^{-1}$. By investigating the interference pattern in the cross section lineshape of $e^+e^-\toφη$, the relative phase between the strong and electromagnetic amplitudes of $J/ψ$ decay is determined to be within $[133^\circ,228^\circ]$ at 68\% confidence level. The result hints at interference between the strong and electromagnetic amplitudes of $J/ψ$ decay. △ Less

Submitted 9 May, 2025; originally announced May 2025.

arXiv:2505.05512 [pdf, other]

Occupancy World Model for Robots

Authors: Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Jingkai Sun, Jiahang Cao, Jiaxu Wang, Hao Cheng, Xiaozhu Ju, Zhengping Che, Renjing Xu, Jian Tang

Abstract: Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structure… ▽ More Understanding and forecasting the scene evolutions deeply affect the exploration and decision of embodied agents. While traditional methods simulate scene evolutions through trajectory prediction of potential instances, current works use the occupancy world model as a generative framework for describing fine-grained overall scene dynamics. However, existing methods cluster on the outdoor structured road scenes, while ignoring the exploration of forecasting 3D occupancy scene evolutions for robots in indoor scenes. In this work, we explore a new framework for learning the scene evolutions of observed fine-grained occupancy and propose an occupancy world model based on the combined spatio-temporal receptive field and guided autoregressive transformer to forecast the scene evolutions, called RoboOccWorld. We propose the Conditional Causal State Attention (CCSA), which utilizes camera poses of next state as conditions to guide the autoregressive transformer to adapt and understand the indoor robotics scenarios. In order to effectively exploit the spatio-temporal cues from historical observations, Hybrid Spatio-Temporal Aggregation (HSTA) is proposed to obtain the combined spatio-temporal receptive field based on multi-scale spatio-temporal windows. In addition, we restructure the OccWorld-ScanNet benchmark based on local annotations to facilitate the evaluation of the indoor 3D occupancy scene evolution prediction task. Experimental results demonstrate that our RoboOccWorld outperforms state-of-the-art methods in indoor 3D occupancy scene evolution prediction task. The code will be released soon. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2505.05108 [pdf, ps, other]

Multi-agent Embodied AI: Advances and Future Directions

Authors: Zhaohan Feng, Ruiqi Xue, Lei Yuan, Yang Yu, Ning Ding, Meiqin Liu, Bingzhao Gao, Jian Sun, Xinhu Zheng, Gang Wang

Abstract: Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact with their environments. Through the use of sensors for input and actuators for action, these systems can learn and adapt based on real-world feedback, allowing… ▽ More Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact with their environments. Through the use of sensors for input and actuators for action, these systems can learn and adapt based on real-world feedback, allowing them to perform tasks effectively in dynamic and unpredictable environments. As techniques such as deep learning (DL), reinforcement learning (RL), and large language models (LLMs) mature, embodied AI has become a leading field in both academia and industry, with applications spanning robotics, healthcare, transportation, and manufacturing. However, most research has focused on single-agent systems that often assume static, closed environments, whereas real-world embodied AI must navigate far more complex scenarios. In such settings, agents must not only interact with their surroundings but also collaborate with other agents, necessitating sophisticated mechanisms for adaptation, real-time learning, and collaborative problem-solving. Despite increasing interest in multi-agent systems, existing research remains narrow in scope, often relying on simplified models that fail to capture the full complexity of dynamic, open environments for multi-agent embodied AI. Moreover, no comprehensive survey has systematically reviewed the advancements in this area. As embodied AI rapidly evolves, it is crucial to deepen our understanding of multi-agent embodied AI to address the challenges presented by real-world applications. To fill this gap and foster further development in the field, this paper reviews the current state of research, analyzes key contributions, and identifies challenges and future directions, providing insights to guide innovation and progress in this field. △ Less

Submitted 21 June, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

arXiv:2505.04655 [pdf, ps, other]

Integration of Large Language Models and Traditional Deep Learning for Social Determinants of Health Prediction

Authors: Paul Landes, Jimeng Sun, Adam Cross

Abstract: Social Determinants of Health (SDoH) are economic, social and personal circumstances that affect or influence an individual's health status. SDoHs have shown to be correlated to wellness outcomes, and therefore, are useful to physicians in diagnosing diseases and in decision-making. In this work, we automatically extract SDoHs from clinical text using traditional deep learning and Large Language M… ▽ More Social Determinants of Health (SDoH) are economic, social and personal circumstances that affect or influence an individual's health status. SDoHs have shown to be correlated to wellness outcomes, and therefore, are useful to physicians in diagnosing diseases and in decision-making. In this work, we automatically extract SDoHs from clinical text using traditional deep learning and Large Language Models (LLMs) to find the advantages and disadvantages of each on an existing publicly available dataset. Our models outperform a previous reference point on a multilabel SDoH classification by 10 points, and we present a method and model to drastically speed up classification (12X execution time) by eliminating expensive LLM processing. The method we present combines a more nimble and efficient solution that leverages the power of the LLM for precision and traditional deep learning methods for efficiency. We also show highly performant results on a dataset supplemented with synthetic data and several traditional deep learning models that outperform LLMs. Our models and methods offer the next iteration of automatic prediction of SDoHs that impact at-risk patients. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2505.04369 [pdf, other]

WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing

Authors: Jie Sun, Heng Liu, Yongzhen Wang, Xiao-Ping Zhang, Mingqiang Wei

Abstract: In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail en… ▽ More In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail enhancement. This coarse-to-fine strategy enables WDMamba to effectively capture features specific to each stage of the dehazing process, resulting in high-quality restored images. Specifically, in the low-frequency restoration stage, we integrate Mamba blocks to reconstruct global structures with linear complexity, efficiently removing overall haze and producing a coarse restored image. Thereafter, the detail enhancement stage reinstates fine-grained information that may have been overlooked during the previous phase, culminating in the final dehazed output. Furthermore, to enhance detail retention and achieve more natural dehazing, we introduce a self-guided contrastive regularization during network training. By utilizing the coarse restored output as a hard negative example, our model learns more discriminative representations, substantially boosting the overall dehazing performance. Extensive evaluations on public dehazing benchmarks demonstrate that our method surpasses state-of-the-art approaches both qualitatively and quantitatively. Code is available at https://github.com/SunJ000/WDMamba. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2505.03483 [pdf, ps, other]

Measurement of the branching fraction ratio $R_K$ at large dilepton invariant mass

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1134 additional authors not shown)

Abstract: A test of lepton universality between muons and electrons is performed using $B^+\to K^+\ell^+\ell^-$ decays (where $\ell$ = $e$, $μ$), in the dilepton invariant-mass-squared region above 14.3 GeV$^2/c^4$. The data used for the measurement consists of beauty meson decays produced in proton-proton collisions, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$, collected by the LHCb exp… ▽ More A test of lepton universality between muons and electrons is performed using $B^+\to K^+\ell^+\ell^-$ decays (where $\ell$ = $e$, $μ$), in the dilepton invariant-mass-squared region above 14.3 GeV$^2/c^4$. The data used for the measurement consists of beauty meson decays produced in proton-proton collisions, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$, collected by the LHCb experiment between 2011 and 2018. The ratio of branching fractions for $B^+\to K^+μ^+μ^-$ and $B^+\to K^+e^+e^-$ decays is measured to be $R_K = 1.08^{+0.11}_{-0.09}\;(\text{stat})\;^{+0.04}_{-0.04}\;(\text{syst})$, which is consistent with the Standard Model prediction of unity. This constitutes the most precise test of lepton flavour universality using $B^+\to K^+\ell^+\ell^-$ decays with dilepton invariant-mass-squared above the $ψ(2S)$ mass, whilst being the first of its kind at a hadron collider. △ Less

Submitted 25 June, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3164/ (LHCb public pages)

Report number: LHCb-PAPER-2024-056, CERN-EP-2025-069

arXiv:2505.03180 [pdf, other]

Observation of resonant contribution to the $e^+e^-\to Ω^{-}\barΩ^{+}$ around 4.2~GeV and evidence of $ψ(3770)\to Ω^{-}\barΩ^{+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (625 additional authors not shown)

Abstract: Using $e^+e^-$ collision data corresponding to a total integrated luminosity of 22.7 fb$^{-1}$, collected at center-of-mass energies between 3.7 and 4.7 GeV with the BESIII detector, we present a measurement of energy-dependent cross sections and effective form factors for the process of $e^+e^-\to Ω^{-}\barΩ^+$. By conducting a fit to the cross sections of $e^+e^-\to Ω^{-}\barΩ^+$ considering the… ▽ More Using $e^+e^-$ collision data corresponding to a total integrated luminosity of 22.7 fb$^{-1}$, collected at center-of-mass energies between 3.7 and 4.7 GeV with the BESIII detector, we present a measurement of energy-dependent cross sections and effective form factors for the process of $e^+e^-\to Ω^{-}\barΩ^+$. By conducting a fit to the cross sections of $e^+e^-\to Ω^{-}\barΩ^+$ considering the continuum and resonant contributions, a clear resonant structure in the spectrum around 4.2 GeV is observed for the first time with a statistical significance exceeding 10$σ$, and it can be well described with the line shape of the $Y(4230)$ and $Y(4320)$ observed in $e^+e^-\to π^{+}π^{-}J/ψ$. Evidence for the decay $ψ(3770) \to Ω^-\barΩ^{+}$ is observed with a statistical significance of 4.4$σ$ by analyzing the measured cross sections together with earlier BESIII results, and the branching fraction is firstly measured to be $(4.0\pm1.0\pm0.6)$ $\times$ $10^{-5}$, where the first uncertainty is statistical and the second is systematic. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 9 pages, 3 figures

arXiv:2505.02471 [pdf, ps, other]

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Authors: Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

Abstract: We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale repr… ▽ More We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language. Specifically, this project provides an open-source implementation of the integrated MetaQueries and M2-omni framework, while introducing the novel multi-scale learnable tokens and multi-scale representation alignment strategy. By leveraging a fixed MLLM and a learnable diffusion model, Ming-Lite-Uni enables native multimodal AR models to perform both text-to-image generation and instruction based image editing tasks, expanding their capabilities beyond pure visual understanding. Our experimental results demonstrate the strong performance of Ming-Lite-Uni and illustrate the impressive fluid nature of its interactive process. All code and model weights are open-sourced to foster further exploration within the community. Notably, this work aligns with concurrent multimodal AI milestones - such as ChatGPT-4o with native image generation updated in March 25, 2025 - underscoring the broader significance of unified models like Ming-Lite-Uni on the path toward AGI. Ming-Lite-Uni is in alpha stage and will soon be further refined. △ Less

Submitted 12 June, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

Comments: https://github.com/inclusionAI/Ming/tree/Ming-Lite-Omni-Preview/Ming-unify

arXiv:2505.02191 [pdf, ps, other]

Matrices as graded BiHom-algebras and decompositions

Authors: Jiacheng Sun, Shuanhong Wang, Haoran Zhu

Abstract: We present matrices as graded BiHom-algebras and consider various characteristics of their decompositions. Specifically, we introduce a notion of connection in the support of the grading and use it to construct a family of canonical graded ideals. We show that, under suitable assumptions, such as $Σ$-multiplicativity, maximal length, and centre triviality, the matrix BiHom-algebra decomposes into… ▽ More We present matrices as graded BiHom-algebras and consider various characteristics of their decompositions. Specifically, we introduce a notion of connection in the support of the grading and use it to construct a family of canonical graded ideals. We show that, under suitable assumptions, such as $Σ$-multiplicativity, maximal length, and centre triviality, the matrix BiHom-algebra decomposes into a direct sum of graded simple ideals. We further extend our results to general graded BiHom-algebras over arbitrary base fields. As applications, we reinterpret classical gradings on matrix algebras such as those induced by Pauli matrices and the $\mathbb{Z}_n \times \mathbb{Z}_n $-grading in terms of our setting. △ Less

Submitted 4 May, 2025; originally announced May 2025.

Comments: 23 pp

arXiv:2505.02042 [pdf, ps, other]

Standing waves with prescribed mass for biharmonic NLS with positive dispersion and Sobolev critical exponent

Authors: Juntao Sun, Shuai Yao, He Zhang

Abstract: We investigate standing waves with prescribed mass for a class of biharmonic Schrodinger equations with positive Laplacian dispersion in the Sobolev critical regime. By establishing novel energy inequalities and developing a direct minimization approach, we prove the existence of two normalized solutions for the corresponding stationary problem. The first one is a ground state with negative level,… ▽ More We investigate standing waves with prescribed mass for a class of biharmonic Schrodinger equations with positive Laplacian dispersion in the Sobolev critical regime. By establishing novel energy inequalities and developing a direct minimization approach, we prove the existence of two normalized solutions for the corresponding stationary problem. The first one is a ground state with negative level, and the second one is a higher-energy solution with positive level. It is worth noting that we do not work in the space of radial functions, and do not use Palais-Smale sequences so as to avoid applying the relatively complex mini-max approach based on a strong topological argument. Finally, we explore the relationship between the ground states and the least action solutions, some asymptotic properties and dynamical behavior of solutions, such as the orbital stability and the global existence. △ Less

Submitted 4 May, 2025; originally announced May 2025.

MSC Class: 35A01; 35J35; 35Q55

arXiv:2505.00972 [pdf, other]

Seeking to Collide: Online Safety-Critical Scenario Generation for Autonomous Driving with Retrieval Augmented Large Language Models

Authors: Yuewen Mei, Tong Nie, Jian Sun, Ye Tian

Abstract: Simulation-based testing is crucial for validating autonomous vehicles (AVs), yet existing scenario generation methods either overfit to common driving patterns or operate in an offline, non-interactive manner that fails to expose rare, safety-critical corner cases. In this paper, we introduce an online, retrieval-augmented large language model (LLM) framework for generating safety-critical drivin… ▽ More Simulation-based testing is crucial for validating autonomous vehicles (AVs), yet existing scenario generation methods either overfit to common driving patterns or operate in an offline, non-interactive manner that fails to expose rare, safety-critical corner cases. In this paper, we introduce an online, retrieval-augmented large language model (LLM) framework for generating safety-critical driving scenarios. Our method first employs an LLM-based behavior analyzer to infer the most dangerous intent of the background vehicle from the observed state, then queries additional LLM agents to synthesize feasible adversarial trajectories. To mitigate catastrophic forgetting and accelerate adaptation, we augment the framework with a dynamic memorization and retrieval bank of intent-planner pairs, automatically expanding its behavioral library when novel intents arise. Evaluations using the Waymo Open Motion Dataset demonstrate that our model reduces the mean minimum time-to-collision from 1.62 to 1.08 s and incurs a 75% collision rate, substantially outperforming baselines. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2505.00655 [pdf]

Why the hyperbolic polaritons are hyperbolic?

Authors: Xiaoyu Xiong, Le Zhou, Yihang Fan, Weipeng Wang, Yongzheng Wen, Yang Shen, Zhengjun Zhang, Jingbo Sun, Ji Zhou

Abstract: Polaritons travelling along a hyperbolic medium's surface have recently sparked significant interest in nanophotonics for the unprecedented manipulation ability on light at the nanoscale in a planar way, promising potential nano-optical applications, especially in two-dimensional circuitry. Despite of being named hyperbolic polaritons, the hyperbolic nature has not been thoroughly revealed since a… ▽ More Polaritons travelling along a hyperbolic medium's surface have recently sparked significant interest in nanophotonics for the unprecedented manipulation ability on light at the nanoscale in a planar way, promising potential nano-optical applications, especially in two-dimensional circuitry. Despite of being named hyperbolic polaritons, the hyperbolic nature has not been thoroughly revealed since an analytical description of the Iso-frequency contour is still elusive. In this work, we proposed an analytical form for describing the iso-frequency contour of the hyperbolic polaritons, showcasing their strictly hyperbolic nature. Such an analytical form is obtained based on the focusing behavior of the hyperbolic polaritons and verified by both the published data from commonly used hyperbolic media systems of the hyperbolic polaritons and our own experimental characterizations on a hyperbolic metamaterial film. By presenting a concise and intuitive physical image, this work may provide a groundbreaking methodology in developing novel hyperbolic polaritons based optical devices. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2505.00352 [pdf, other]

Hybrid-integrated dark-pulse microcombs towards visible light spectrum

Authors: Jinbao Long, Xiaoying Yan, Sanli Huang, Wei Sun, Hao Tan, Zeying Zhong, Zhenyuan Shang, Jiahao Sun, Baoqi Shi, Chen Shen, Yi-Han Luo, Junqiu Liu

Abstract: Leveraging hybrid integration, we demonstrate dark-pulse formation at 780-nm wavelength band in integrated Si$_3$N$_4$ microresonators driven by high-power AlGaAs-based chip-scale lasers. The device outputs coherent frequency combs with electronically detectable repetition rates down to 20 GHz, paving a route to efficient and compact atom-chip interfaces for spectroscopy, metrology and sensing. Leveraging hybrid integration, we demonstrate dark-pulse formation at 780-nm wavelength band in integrated Si$_3$N$_4$ microresonators driven by high-power AlGaAs-based chip-scale lasers. The device outputs coherent frequency combs with electronically detectable repetition rates down to 20 GHz, paving a route to efficient and compact atom-chip interfaces for spectroscopy, metrology and sensing. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2505.00290 [pdf, other]

Multi-Hierarchical Fine-Grained Feature Mapping Driven by Feature Contribution for Molecular Odor Prediction

Authors: Hong Xin Xie, Jian De Sun, Fan Fu Xue, Zi Fei Han, Shan Shan Feng, Qi Chen

Abstract: Molecular odor prediction is the process of using a molecule's structure to predict its smell. While accurate prediction remains challenging, AI models can suggest potential odors. Existing methods, however, often rely on basic descriptors or handcrafted fingerprints, which lack expressive power and hinder effective learning. Furthermore, these methods suffer from severe class imbalance, limiting… ▽ More Molecular odor prediction is the process of using a molecule's structure to predict its smell. While accurate prediction remains challenging, AI models can suggest potential odors. Existing methods, however, often rely on basic descriptors or handcrafted fingerprints, which lack expressive power and hinder effective learning. Furthermore, these methods suffer from severe class imbalance, limiting the training effectiveness of AI models. To address these challenges, we propose a Feature Contribution-driven Hierarchical Multi-Feature Mapping Network (HMFNet). Specifically, we introduce a fine-grained, Local Multi-Hierarchy Feature Extraction module (LMFE) that performs deep feature extraction at the atomic level, capturing detailed features crucial for odor prediction. To enhance the extraction of discriminative atomic features, we integrate a Harmonic Modulated Feature Mapping (HMFM). This module dynamically learns feature importance and frequency modulation, improving the model's capability to capture relevant patterns. Additionally, a Global Multi-Hierarchy Feature Extraction module (GMFE) is designed to learn global features from the molecular graph topology, enabling the model to fully leverage global information and enhance its discriminative power for odor prediction. To further mitigate the issue of class imbalance, we propose a Chemically-Informed Loss (CIL). Experimental results demonstrate that our approach significantly improves performance across various deep learning models, highlighting its potential to advance molecular structure representation and accelerate the development of AI-driven technologies. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.21711 [pdf, other]

The Intermediate-Mass Black Hole Reverberation Mapping Project: First Detection of Mid-Infrared Lags in Prototypical IMBHs in NGC 4395 and POX 52

Authors: Jingbo Sun, Hengxiao Guo, Wenwen Zuo, Paulina Lira, Minfeng Gu, Philip G. Edwards, Shu Wang, Jamie Stevens, Tao An, Samuzal Barua, Zhen-yi Cai, Haicheng Feng, Alok C. Gupta, Luis C. Ho, Dragana Ilić, Andjelka B. Kovačević, ShaSha Li, Mar Mezcua, Luka Č. Popović, Paula Sánchez-Sáez, Mouyuan Sun, Rongfeng Shen, Vivian U, Oliver Vince, Junxian Wang , et al. (3 additional authors not shown)

Abstract: The search for robust evidence of intermediate-mass black holes (IMBHs) is crucial for understanding black hole seeding process and the formation of supermassive black holes in the early Universe. NGC 4395 and POX 52 are two prototypical IMBH hosts, both exhibiting multi-line evidence of low-mass black hole activity. Here, we report the first detection of mid-infrared (MIR) lags in response to opt… ▽ More The search for robust evidence of intermediate-mass black holes (IMBHs) is crucial for understanding black hole seeding process and the formation of supermassive black holes in the early Universe. NGC 4395 and POX 52 are two prototypical IMBH hosts, both exhibiting multi-line evidence of low-mass black hole activity. Here, we report the first detection of mid-infrared (MIR) lags in response to optical variability, with measurements of $3.0^{+2.4}_{-1.9}$ days for NGC 4395 and $35.2^{+14.2}_{-11.7}$ days for POX~52 at $3.4$ $μ$m, respectively, using archival optical data and observations from the Wide-field Infrared Survey Explorer (WISE). This detection provides the first reverberation evidence of low-mass black hole activity in POX 52. The time lags of these two low-mass, low-luminosity active galactic nuclei (AGNs) generally follow the extent of the $R_{\rm dust}-L_{\rm 5100}$ relation found in higher-mass AGNs. Based on an empirical relation between the broad-line region and dusty torus size, we constrain the black hole mass of POX 52 to log($M_{\rm BH}$/$M_\odot$) = 5.5 $\pm$ 0.37 (systemic and statistical errors), confirming its IMBH nature. Furthermore, long-term optical continuum monitoring of POX 52 reveals a mild inter-band lag of $\lesssim$ 1 day. However, no significant intranight variability was detected during its one-night, high-cadence monitoring, which we attribute to the longer duty cycle of fast variability in POX 52 compared to that in NGC 4395. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.21539 [pdf, other]

Search for the lepton number violation decay $ω\to π^+ π^+ e^-e^- +c.c.$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (698 additional authors not shown)

Abstract: The lepton number violation decay $ω\to π^+ π^+ e^-e^- +c.c.$ is searched for via $J/ψ\to ωη$ using a data sample of $(1.0087 \pm 0.0044) \times 10^{10}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider. No significant signal is observed, and the upper limit on the branching fraction of $ω\to π^+ π^+ e^-e^- +c.c.$ at the 90\% confidence level is determined for the first time to… ▽ More The lepton number violation decay $ω\to π^+ π^+ e^-e^- +c.c.$ is searched for via $J/ψ\to ωη$ using a data sample of $(1.0087 \pm 0.0044) \times 10^{10}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider. No significant signal is observed, and the upper limit on the branching fraction of $ω\to π^+ π^+ e^-e^- +c.c.$ at the 90\% confidence level is determined for the first time to be $2.8 \times 10^{-6}$. △ Less

Submitted 30 April, 2025; originally announced April 2025.

Comments: 9 pages, 3 figures

arXiv:2504.21420 [pdf, other]

A Test Suite for Efficient Robustness Evaluation of Face Recognition Systems

Authors: Ruihan Zhang, Jun Sun

Abstract: Face recognition is a widely used authentication technology in practice, where robustness is required. It is thus essential to have an efficient and easy-to-use method for evaluating the robustness of (possibly third-party) trained face recognition systems. Existing approaches to evaluating the robustness of face recognition systems are either based on empirical evaluation (e.g., measuring attacki… ▽ More Face recognition is a widely used authentication technology in practice, where robustness is required. It is thus essential to have an efficient and easy-to-use method for evaluating the robustness of (possibly third-party) trained face recognition systems. Existing approaches to evaluating the robustness of face recognition systems are either based on empirical evaluation (e.g., measuring attacking success rate using state-of-the-art attacking methods) or formal analysis (e.g., measuring the Lipschitz constant). While the former demands significant user efforts and expertise, the latter is extremely time-consuming. In pursuit of a comprehensive, efficient, easy-to-use and scalable estimation of the robustness of face recognition systems, we take an old-school alternative approach and introduce RobFace, i.e., evaluation using an optimised test suite. It contains transferable adversarial face images that are designed to comprehensively evaluate a face recognition system's robustness along a variety of dimensions. RobFace is system-agnostic and still consistent with system-specific empirical evaluation or formal analysis. We support this claim through extensive experimental results with various perturbations on multiple face recognition systems. To our knowledge, RobFace is the first system-agnostic robustness estimation test suite. △ Less

Submitted 30 April, 2025; originally announced April 2025.

Comments: IEEE Transactions on Reliability

arXiv:2504.21269 [pdf, ps, other]

doi 10.1007/JHEP07(2025)121

Observation of the decay $B^0_{s}\to K^0 p \bar{p}$ and measurement of the $B^0_{(s)} \to K^0 p \bar{p}$ branching fractions

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1128 additional authors not shown)

Abstract: A study of the charmless baryonic decays $B^0_{(s)} \to K^0 p \bar{p}$ is presented, where $B^0_{(s)}$ denotes either a $B^0$ or a $B^0_s$ meson. The analysis is based on proton-proton collision data collected by the LHCb experiment at centre-of-mass energies of 7, 8, and $13~\mathrm{Tev}$, corresponding to an integrated luminosity of $9~\mathrm{fb}^{-1}$. The decay $B^0_s \to K^0 p \bar{p}$ is ob… ▽ More A study of the charmless baryonic decays $B^0_{(s)} \to K^0 p \bar{p}$ is presented, where $B^0_{(s)}$ denotes either a $B^0$ or a $B^0_s$ meson. The analysis is based on proton-proton collision data collected by the LHCb experiment at centre-of-mass energies of 7, 8, and $13~\mathrm{Tev}$, corresponding to an integrated luminosity of $9~\mathrm{fb}^{-1}$. The decay $B^0_s \to K^0 p \bar{p}$ is observed for the first time, with a measured branching fraction of $(9.14 \pm 1.69 \pm 0.90 \pm 0.33 \pm 0.20) \times 10^{-7}$ and a significance of $5.6σ$. The uncertainties respectively account for statistical and systematic contributions, the precision of the branching fraction of the normalisation channel $B^0 \to K^0 π^{+} π^{-}$ and the fragmentation fraction ratio ${f_s}/{f_d}$. The branching fraction determined for $B^0 \to K^0 p \bar{p}$ is $(2.82 \pm 0.08 \pm 0.12 \pm 0.10) \times 10^{-6}$, which is the most precise measurement to date. △ Less

Submitted 9 July, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3797/ (LHCb public pages)

Report number: LHCb-PAPER-2025-001, CERN-EP-2025-085

Journal ref: JHEP 07 (2025) 121

arXiv:2504.20378 [pdf, other]

Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views

Authors: Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang

Abstract: We present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view… ▽ More We present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view geometric optimization. We propose Sparse2DGS, an MVS-initialized Gaussian Splatting pipeline for complete and accurate reconstruction. Our key insight is to incorporate the geometric-prioritized enhancement schemes, allowing for direct and robust geometric learning under ill-posed conditions. Sparse2DGS outperforms existing methods by notable margins while being ${2}\times$ faster than the NeRF-based fine-tuning approach. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: CVPR 2025

arXiv:2504.20054 [pdf, other]

Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment

Authors: Jiayang Sun, Hongbo Wang, Jie Cao, Huaibo Huang, Ran He

Abstract: While diffusion models excel at generating high-quality images, they often struggle with accurate counting, attributes, and spatial relationships in complex multi-object scenes. One potential approach is to utilize Multimodal Large Language Model (MLLM) as an AI agent to build a self-correction framework. However, these approaches are highly dependent on the capabilities of the employed MLLM, ofte… ▽ More While diffusion models excel at generating high-quality images, they often struggle with accurate counting, attributes, and spatial relationships in complex multi-object scenes. One potential approach is to utilize Multimodal Large Language Model (MLLM) as an AI agent to build a self-correction framework. However, these approaches are highly dependent on the capabilities of the employed MLLM, often failing to account for all objects within the image. To address these challenges, we propose Marmot, a novel and generalizable framework that employs Multi-Agent Reasoning for Multi-Object Self-Correcting, enhancing image-text alignment and facilitating more coherent multi-object image editing. Our framework adopts a divide-and-conquer strategy, decomposing the self-correction task into object-level subtasks according to three critical dimensions: counting, attributes, and spatial relationships. We construct a multi-agent self-correcting system featuring a decision-execution-verification mechanism, effectively mitigating inter-object interference and enhancing editing reliability. To resolve the problem of subtask integration, we propose a Pixel-Domain Stitching Smoother that employs mask-guided two-stage latent space optimization. This innovation enables parallel processing of subtask results, thereby enhancing runtime efficiency while eliminating multi-stage distortion accumulation. Extensive experiments demonstrate that Marmot significantly improves accuracy in object counting, attribute assignment, and spatial relationships for image generation tasks. △ Less

Submitted 25 May, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.19927 [pdf]

Dependence of the Radical Dynamics on the Beam Temporal Profile in FLASH Radiotherapy

Authors: Jianhan Sun, Xianghui Kong, Jianfeng Lv, Xiaodong Liu, Jinghui Wang, Chen Lin, Tian Li, Yibao Zhang, Senlin Huang

Abstract: Purpose: This study aims to investigate the impact of the beam temporal profile on the radical dynamics and inter-track interactions of FLASH radiotherapy, supporting parameter optimization for the equipment development and clinical implementation. Methods: MonteCarlo simulations based on the IRT method were performed to analyze the dynamics after irradiation, including single-pulse or multi-pulse… ▽ More Purpose: This study aims to investigate the impact of the beam temporal profile on the radical dynamics and inter-track interactions of FLASH radiotherapy, supporting parameter optimization for the equipment development and clinical implementation. Methods: MonteCarlo simulations based on the IRT method were performed to analyze the dynamics after irradiation, including single-pulse or multi-pulses irradiation, pulse repetition rate, width and dose. The physicochemical experiments were performed to measure the eaq-lifetimes for validation. The generation and recombination of OH and eaq-radicals were recorded under 6 MeV electron irradiation with varying beam temporal profiles. The radial distributions of the radicals were statistically analyzed, and the corresponding LETd and LETt were calculated. The inter-track interactions were assessed through a mathematical model. Results: The spatial distribution and temporal evolution of radicals were significantly affected by the beam time profiles. Compared with multi-pulses irradiation, single-pulse mode with a width less than 1/10 of the radical lifetime, a repetition interval longer than the radical lifetime, and a dose exceeding 1 Gy/pulse can lead to radicals rapid consumption, reducing the residual content. Instantaneous high dose rates induced radical tracks overlaps. When the single-pulse dose exceeded 1 Gy, the overlap probability approached 100%, aligning with the threshold for radical instantaneous combination. Conclusion: Under a low-duty cycle and high instantaneous dose-rate time profile, the radicals were rapidly consumed through track overlap hence reduced damage to normal tissues, inducing FLASH effect. The optimized time profile can be used to guide the development of equipment and parameter settings in clinical practice to maximize the FLASH effect, such as the laser accelerators and superconducting photocathode guns. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 18 pages

Showing 151–200 of 4,537 results for author: Sun, J