-
CKD-EHR:Clinical Knowledge Distillation for Electronic Health Records
Authors:
Junke Wang,
Hongshun Ling,
Li Zhang,
Longqian Zhang,
Fang Wang,
Yuan Gao,
Zhi Li
Abstract:
Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clini…
▽ More
Electronic Health Records (EHR)-based disease prediction models have demonstrated significant clinical value in promoting precision medicine and enabling early intervention. However, existing large language models face two major challenges: insufficient representation of medical knowledge and low efficiency in clinical deployment. To address these challenges, this study proposes the CKD-EHR (Clinical Knowledge Distillation for EHR) framework, which achieves efficient and accurate disease risk prediction through knowledge distillation techniques. Specifically, the large language model Qwen2.5-7B is first fine-tuned on medical knowledge-enhanced data to serve as the teacher model.It then generates interpretable soft labels through a multi-granularity attention distillation mechanism. Finally, the distilled knowledge is transferred to a lightweight BERT student model. Experimental results show that on the MIMIC-III dataset, CKD-EHR significantly outperforms the baseline model:diagnostic accuracy is increased by 9%, F1-score is improved by 27%, and a 22.2 times inference speedup is achieved. This innovative solution not only greatly improves resource utilization efficiency but also significantly enhances the accuracy and timeliness of diagnosis, providing a practical technical approach for resource optimization in clinical settings. The code and data for this research are available athttps://github.com/209506702/CKD_EHR.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Towards Reliable Forgetting: A Survey on Machine Unlearning Verification, Challenges, and Future Directions
Authors:
Lulu Xue,
Shengshan Hu,
Wei Lu,
Yan Shen,
Dongxu Li,
Peijin Guo,
Ziqi Zhou,
Minghui Li,
Yanjun Zhang,
Leo Yu Zhang
Abstract:
With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a grow…
▽ More
With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a growing body of work on unlearning techniques, verification methodologies remain comparatively underexplored and often fragmented. Existing approaches lack a unified taxonomy and a systematic framework for evaluation. To bridge this gap, this paper presents the first structured survey of machine unlearning verification methods. We propose a taxonomy that organizes current techniques into two principal categories -- behavioral verification and parametric verification -- based on the type of evidence used to assess unlearning fidelity. We examine representative methods within each category, analyze their underlying assumptions, strengths, and limitations, and identify potential vulnerabilities in practical deployment. In closing, we articulate a set of open problems in current verification research, aiming to provide a foundation for developing more robust, efficient, and theoretically grounded unlearning verification mechanisms.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Whole-Body Control Framework for Humanoid Robots with Heavy Limbs: A Model-Based Approach
Authors:
Tianlin Zhang,
Linzhu Yue,
Hongbo Zhang,
Lingwei Zhang,
Xuanqi Zeng,
Zhitao Song,
Yun-Hui Liu
Abstract:
Humanoid robots often face significant balance issues due to the motion of their heavy limbs. These challenges are particularly pronounced when attempting dynamic motion or operating in environments with irregular terrain. To address this challenge, this manuscript proposes a whole-body control framework for humanoid robots with heavy limbs, using a model-based approach that combines a kino-dynami…
▽ More
Humanoid robots often face significant balance issues due to the motion of their heavy limbs. These challenges are particularly pronounced when attempting dynamic motion or operating in environments with irregular terrain. To address this challenge, this manuscript proposes a whole-body control framework for humanoid robots with heavy limbs, using a model-based approach that combines a kino-dynamics planner and a hierarchical optimization problem. The kino-dynamics planner is designed as a model predictive control (MPC) scheme to account for the impact of heavy limbs on mass and inertia distribution. By simplifying the robot's system dynamics and constraints, the planner enables real-time planning of motion and contact forces. The hierarchical optimization problem is formulated using Hierarchical Quadratic Programming (HQP) to minimize limb control errors and ensure compliance with the policy generated by the kino-dynamics planner. Experimental validation of the proposed framework demonstrates its effectiveness. The humanoid robot with heavy limbs controlled by the proposed framework can achieve dynamic walking speeds of up to 1.2~m/s, respond to external disturbances of up to 60~N, and maintain balance on challenging terrains such as uneven surfaces, and outdoor environments.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Situational-Constrained Sequential Resources Allocation via Reinforcement Learning
Authors:
Libo Zhang,
Yang Chen,
Toru Takisaka,
Kaiqi Zhao,
Weidong Li,
Jiamou Liu
Abstract:
Sequential Resource Allocation with situational constraints presents a significant challenge in real-world applications, where resource demands and priorities are context-dependent. This paper introduces a novel framework, SCRL, to address this problem. We formalize situational constraints as logic implications and develop a new algorithm that dynamically penalizes constraint violations. To handle…
▽ More
Sequential Resource Allocation with situational constraints presents a significant challenge in real-world applications, where resource demands and priorities are context-dependent. This paper introduces a novel framework, SCRL, to address this problem. We formalize situational constraints as logic implications and develop a new algorithm that dynamically penalizes constraint violations. To handle situational constraints effectively, we propose a probabilistic selection mechanism to overcome limitations of traditional constraint reinforcement learning (CRL) approaches. We evaluate SCRL across two scenarios: medical resource allocation during a pandemic and pesticide distribution in agriculture. Experiments demonstrate that SCRL outperforms existing baselines in satisfying constraints while maintaining high resource efficiency, showcasing its potential for real-world, context-sensitive decision-making tasks.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Compact representation and long-time extrapolation of real-time data for quantum systems
Authors:
Andre Erpenbeck,
Yuanran Zhu,
Yang Yu,
Lei Zhang,
Richard Gerum,
Olga Goulko,
Chao Yang,
Guy Cohen,
Emanuel Gull
Abstract:
Representing real-time data as a sum of complex exponentials provides a compact form that enables both denoising and extrapolation. As a fully data-driven method, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) algorithm is agnostic to the underlying physical equations, making it broadly applicable to various observables and experimental or numerical setups. In th…
▽ More
Representing real-time data as a sum of complex exponentials provides a compact form that enables both denoising and extrapolation. As a fully data-driven method, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) algorithm is agnostic to the underlying physical equations, making it broadly applicable to various observables and experimental or numerical setups. In this work, we consider applications of the ESPRIT algorithm primarily to extend real-time dynamical data from simulations of quantum systems. We evaluate ESPRIT's performance in the presence of noise and compare it to other extrapolation methods. We demonstrate its ability to extract information from short-time dynamics to reliably predict long-time behavior and determine the minimum time interval required for accurate results. We discuss how this insight can be leveraged in numerical methods that propagate quantum systems in time, and show how ESPRIT can predict infinite-time values of dynamical observables, offering a purely data-driven approach to characterizing quantum phases.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Parallel Branch Model Predictive Control on GPUs
Authors:
Luyao Zhang,
Chenghuai Lin,
Sergio Grammatico
Abstract:
We present a parallel GPU-accelerated solver for branch Model Predictive Control problems. Based on iterative LQR methods, our solver exploits the tree-sparse structure and implements temporal parallelism using the parallel scan algorithm. Consequently, the proposed solver enables parallelism across both the prediction horizon and the scenarios. In addition, we utilize an augmented Lagrangian meth…
▽ More
We present a parallel GPU-accelerated solver for branch Model Predictive Control problems. Based on iterative LQR methods, our solver exploits the tree-sparse structure and implements temporal parallelism using the parallel scan algorithm. Consequently, the proposed solver enables parallelism across both the prediction horizon and the scenarios. In addition, we utilize an augmented Lagrangian method to handle general inequality constraints. We compare our solver with state-of-the-art numerical solvers in two automated driving applications. The numerical results demonstrate that, compared to CPU-based solvers, our solver achieves competitive performance for problems with short horizons and small-scale trees, while outperforming other solvers on large-scale problems.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Fast Transitions of X-ray Variability in the Neutron Star Low Mass X-ray Binary Cygnus X-2
Authors:
Liang Zhang,
Mariano Méndez,
Hua Feng,
Diego Altamirano,
Zi-xu Yang,
Qing-chang Zhao,
Shuang-nan Zhang,
Lian Tao,
Yue Huang,
Xiang Ma,
Shu-mei Jia,
Ming-yu Ge,
Li-ming Song,
Jin-lu Qu,
Shu Zhang
Abstract:
We present a spectral-timing analysis of two NICER observations of the weakly magnetized neutron star low-mass X-ray binary Cygnus X-2. During these observations, we detect a rapid transition from a narrow 50-Hz horizontal-branch oscillation to a broad 5-Hz normal-branch oscillation, accompanied by an increase in source flux and a decrease in spectral hardness. Thanks to the large effective area o…
▽ More
We present a spectral-timing analysis of two NICER observations of the weakly magnetized neutron star low-mass X-ray binary Cygnus X-2. During these observations, we detect a rapid transition from a narrow 50-Hz horizontal-branch oscillation to a broad 5-Hz normal-branch oscillation, accompanied by an increase in source flux and a decrease in spectral hardness. Thanks to the large effective area of NICER, we are able to conduct a detailed comparison of the spectra associated with different types of quasi-periodic oscillations (QPOs) on short timescales. By fitting the spectra with a model that includes a disc and Comptonization components plus two emission lines, we find that the parameters of the disc component do not change significantly during the transition. However, assuming a fixed electron temperature, the optical depth of the Comptonization component decreases significantly. This drop in optical depth may be attributed to the expansion of the boundary layer or spreading layer.In addition, we find that the rms spectra for both the HBO and NBO are hard, suggesting that the boundary layer or spreading layer is driving the variability. We discuss the potential physical origin of the different types of QPOs.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Measurement of the $Ω_c^0$ and $Ξ_c^0$ baryon lifetimes using hadronic $b$-baryon decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1141 additional authors not shown)
Abstract:
The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses top…
▽ More
The lifetimes of the $Ω_c^0$ and $Ξ_c^0$ baryons are measured using a $pp$ collision dataset collected by the LHCb experiment, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The charm baryons are produced in the fully reconstructed decay chains $Ω_b^- \rightarrow Ω_c^0 (\rightarrow pK^-K^-π^+)~π^-$ and $Ξ_b^- \rightarrow Ξ_c^0 (\rightarrow pK^-K^-π^+)~π^-$. The measurement uses topologically and kinematically similar $B^- \rightarrow D^0(\rightarrow K^-K^+π^-π^+)~π^-$ decays for normalisation. The measured lifetimes are
$τ_{Ω_c^0} = 276.3 \pm 19.4~\rm{(stat)} \pm 1.8~\rm{(syst)} \pm 0.7~(τ_{D^0})~\rm{fs}$,
$τ_{Ξ_c^0} = 149.2 \pm ~\,2.5~\rm{(stat)} \pm 0.9~\rm{(syst)} \pm 0.4~(τ_{D^0})~\rm{fs}$,
where the first uncertainty is statistical, the second systematic and the third due to the uncertainty of the $D^0$ lifetime. These results are consistent with previous measurements performed by the LHCb experiment.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Experimental Observation of Purity-Like Invariants of Multi-photon States in Linear Optics
Authors:
Baichuan Yang,
Hao Zhan,
Minghao Mi,
Aonan Zhang,
Liang Xu,
Lijian Zhang
Abstract:
Linear optical networks (LONs) with multi-photon inputs offer a powerful platform for advanced quantum technologies. However, the number of degrees of freedom of a LON is far fewer than the dimensionality of the multi-photon multi-mode Fock space, therefore it cannot implement arbitrary unitary evolutions on multi-photon states. Understanding these intrinsic constraints is essential for the prepar…
▽ More
Linear optical networks (LONs) with multi-photon inputs offer a powerful platform for advanced quantum technologies. However, the number of degrees of freedom of a LON is far fewer than the dimensionality of the multi-photon multi-mode Fock space, therefore it cannot implement arbitrary unitary evolutions on multi-photon states. Understanding these intrinsic constraints is essential for the preparation, manipulation, and measurement of multi-photon states with LONs. Although several properties of the multi-photon state have been shown to be invariant under LON unitary evolution, their physical interpretation remains elusive. Here, we introduce a Hermitian transfer matrix approach to explore the multi-photon evolution, revealing that the overall state purity decomposes into three distinct invariants -- each arising from either single-photon dynamics or the multi-photon interference. We experimentally observe these purity-like invariants by preparing distinct initial states, applying LON unitaries, and measuring the resulting invariants. Our results not only confirm their conservation but also provide valuable insights into multi-photon state evolution in linear optics.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Towards Visualizing Electronic Medical Records via Natural Language Queries
Authors:
Haodi Zhang,
Siqi Ning,
Qiyong Zheng,
Jinyin Nie,
Liangjie Zhang,
Weicheng Wang,
Yuanfeng Song
Abstract:
Electronic medical records (EMRs) contain essential data for patient care and clinical research. With the diversity of structured and unstructured data in EHR, data visualization is an invaluable tool for managing and explaining these complexities. However, the scarcity of relevant medical visualization data and the high cost of manual annotation required to develop such datasets pose significant…
▽ More
Electronic medical records (EMRs) contain essential data for patient care and clinical research. With the diversity of structured and unstructured data in EHR, data visualization is an invaluable tool for managing and explaining these complexities. However, the scarcity of relevant medical visualization data and the high cost of manual annotation required to develop such datasets pose significant challenges to advancing medical visualization techniques. To address this issue, we propose an innovative approach using large language models (LLMs) for generating visualization data without labor-intensive manual annotation. We introduce a new pipeline for building text-to-visualization benchmarks suitable for EMRs, enabling users to visualize EMR statistics through natural language queries (NLQs). The dataset presented in this paper primarily consists of paired text medical records, NLQs, and corresponding visualizations, forming the first large-scale text-to-visual dataset for electronic medical record information called MedicalVis with 35,374 examples. Additionally, we introduce an LLM-based approach called MedCodeT5, showcasing its viability in generating EMR visualizations from NLQs, outperforming various strong text-to-visualization baselines. Our work facilitates standardized evaluation of EMR visualization methods while providing researchers with tools to advance this influential field of application. In a nutshell, this study and dataset have the potential to promote advancements in eliciting medical insights through visualization.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models
Authors:
Yang Dai,
Oubo Ma,
Longfei Zhang,
Xingxing Liang,
Xiaochun Cao,
Shouling Ji,
Jiaheng Zhang,
Jincai Huang,
Li Shen
Abstract:
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Mo…
▽ More
Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3\% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents
Authors:
LeCheng Zhang,
Yuanshi Wang,
Haotian Shen,
Xujie Wang
Abstract:
The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical con…
▽ More
The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical context, several Large Language Model (LLM) agents (including Gemini, DeepSeek, and GPT variants) guided by structured prompts, and an agent based on Proximal Policy Optimization (PPO) employing a Transformer encoder for comprehensive game history processing. Performance is benchmarked against the baseline, with the PPO-based agent demonstrating superior win rates ($58.5\% \pm 1.0\%$), significantly outperforming the LLM counterparts. Our analysis highlights the strengths of deep reinforcement learning in policy refinement for complex deductive tasks, particularly in learning implicit strategies from self-play. We also examine the capabilities and inherent limitations of current LLMs in maintaining strict logical consistency and strategic depth over extended gameplay, despite sophisticated prompting. This study contributes to the broader understanding of AI in recreational games involving hidden information and multi-step logical reasoning, offering insights into effective agent design and the comparative advantages of different AI approaches.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Information fusion strategy integrating pre-trained language model and contrastive learning for materials knowledge mining
Authors:
Yongqian Peng,
Zhouran Zhang,
Longhui Zhang,
Fengyuan Zhao,
Yahao Li,
Yicong Ye,
Shuxin Bai
Abstract:
Machine learning has revolutionized materials design, yet predicting complex properties like alloy ductility remains challenging due to the influence of processing conditions and microstructural features that resist quantification through traditional reductionist approaches. Here, we present an innovative information fusion architecture that integrates domain-specific texts from materials science…
▽ More
Machine learning has revolutionized materials design, yet predicting complex properties like alloy ductility remains challenging due to the influence of processing conditions and microstructural features that resist quantification through traditional reductionist approaches. Here, we present an innovative information fusion architecture that integrates domain-specific texts from materials science literature with quantitative physical descriptors to overcome these limitations. Our framework employs MatSciBERT for advanced textual comprehension and incorporates contrastive learning to automatically extract implicit knowledge regarding processing parameters and microstructural characteristics. Through rigorous ablation studies and comparative experiments, the model demonstrates superior performance, achieving coefficient of determination (R2) values of 0.849 and 0.680 on titanium alloy validation set and refractory multi-principal-element alloy test set. This systematic approach provides a holistic framework for property prediction in complex material systems where quantitative descriptors are incomplete and establishes a foundation for knowledge-guided materials design and informatics-driven materials discovery.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Uniaxial stress tuning of interfacial thermal conductance in cubic BAs/4H-SiC heterostructures
Authors:
Lei Zhang,
Fei Tian,
Ke Chen,
Zhongbo Yan,
Kun Cao
Abstract:
Understanding interfacial thermal transport is essential for improving thermal management in high-speed power electronic devices, where the efficient removal of excess heat is a critical challenge. In this study, a machine learning interatomic potential with near first-principles accuracy was employed to investigate the interfacial thermal conductance (ITC) between [111]-oriented cubic boron arsen…
▽ More
Understanding interfacial thermal transport is essential for improving thermal management in high-speed power electronic devices, where the efficient removal of excess heat is a critical challenge. In this study, a machine learning interatomic potential with near first-principles accuracy was employed to investigate the interfacial thermal conductance (ITC) between [111]-oriented cubic boron arsenide (cBAs) and [0001]-oriented 4H silicon carbide (4H-SiC), as well as its dependence on uniaxial stress. Among all possible bonding configurations at the cBAs(111)/4H-SiC(0001) interface, the B-C bonded interface was identified as the most energetically favorable. Non-equilibrium molecular dynamics simulations revealed that, under ambient conditions (300 K and 0 GPa), the ITC of the B-C interface reaches 353 $\pm$ 6 MW m$^{-2}$ K$^{-1}$, and increases monotonically to 460 $\pm$ 3 MW m$^{-2}$ K$^{-1}$ under a uniaxial stress of 25 GPa perpendicular to the interface. For comparison, the As-C bonded interface exhibits a lower ITC, increasing from 233 $\pm$ 7 to 318 $\pm$ 6 MW m$^{-2}$ K$^{-1}$ over the same stress range. These results demonstrate that proper interfacial bonding and moderate uniaxial stress can significantly enhance thermal transport across the cBAs(111)/4H-SiC(0001) heterointerface, offering valuable insight for thermal design in next-generation power electronics.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023
Authors:
Navodini Wijethilake,
Reuben Dorent,
Marina Ivory,
Aaron Kujawa,
Stefan Cornelissen,
Patrick Langenhuizen,
Mohamed Okasha,
Anna Oviedova,
Hexin Dong,
Bogyeong Kang,
Guillaume Sallé,
Luyi Han,
Ziyuan Zhao,
Han Liu,
Tao Yang,
Shahad Hardan,
Hussain Alasmawi,
Santosh Sanjeev,
Yuzhou Zhuang,
Satoshi Kondo,
Maria Baldeon Calisto,
Shaikh Muhammad Uzair Noman,
Cancan Chen,
Ipek Oguz,
Rongguo Zhang
, et al. (14 additional authors not shown)
Abstract:
The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a mea…
▽ More
The cross-Modality Domain Adaptation (crossMoDA) challenge series, initiated in 2021 in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), focuses on unsupervised cross-modality segmentation, learning from contrast-enhanced T1 (ceT1) and transferring to T2 MRI. The task is an extreme example of domain shift chosen to serve as a meaningful and illustrative benchmark. From a clinical application perspective, it aims to automate Vestibular Schwannoma (VS) and cochlea segmentation on T2 scans for more cost-effective VS management. Over time, the challenge objectives have evolved to enhance its clinical relevance. The challenge evolved from using single-institutional data and basic segmentation in 2021 to incorporating multi-institutional data and Koos grading in 2022, and by 2023, it included heterogeneous routine data and sub-segmentation of intra- and extra-meatal tumour components. In this work, we report the findings of the 2022 and 2023 editions and perform a retrospective analysis of the challenge progression over the years. The observations from the successive challenge contributions indicate that the number of outliers decreases with an expanding dataset. This is notable since the diversity of scanning protocols of the datasets concurrently increased. The winning approach of the 2023 edition reduced the number of outliers on the 2021 and 2022 testing data, demonstrating how increased data heterogeneity can enhance segmentation performance even on homogeneous data. However, the cochlea Dice score declined in 2023, likely due to the added complexity from tumour sub-annotations affecting overall segmentation performance. While progress is still needed for clinically acceptable VS segmentation, the plateauing performance suggests that a more challenging cross-modal task may better serve future benchmarking.
△ Less
Submitted 24 June, 2025; v1 submitted 13 June, 2025;
originally announced June 2025.
-
A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification
Authors:
Lu Zhang,
Sangarapillai Lambotharan,
Gan Zheng,
Fabio Roli
Abstract:
Advantages of deep learning over traditional methods have been demonstrated for radio signal classification in the recent years. However, various researchers have discovered that even a small but intentional feature perturbation known as adversarial examples can significantly deteriorate the performance of the deep learning based radio signal classification. Among various kinds of adversarial exam…
▽ More
Advantages of deep learning over traditional methods have been demonstrated for radio signal classification in the recent years. However, various researchers have discovered that even a small but intentional feature perturbation known as adversarial examples can significantly deteriorate the performance of the deep learning based radio signal classification. Among various kinds of adversarial examples, universal adversarial perturbation has gained considerable attention due to its feature of being data independent, hence as a practical strategy to fool the radio signal classification with a high success rate. Therefore, in this paper, we investigate a defense system called neural rejection system to propose against universal adversarial perturbations, and evaluate its performance by generating white-box universal adversarial perturbations. We show that the proposed neural rejection system is able to defend universal adversarial perturbations with significantly higher accuracy than the undefended deep neural network.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices
Authors:
Lu Zhang,
Sangarapillai Lambotharan,
Gan Zheng,
Guisheng Liao,
Basil AsSadhan,
Fabio Roli
Abstract:
Due to great success of transformers in many applications such as natural language processing and computer vision, transformers have been successfully applied in automatic modulation classification. We have shown that transformer-based radio signal classification is vulnerable to imperceptible and carefully crafted attacks called adversarial examples. Therefore, we propose a defense system against…
▽ More
Due to great success of transformers in many applications such as natural language processing and computer vision, transformers have been successfully applied in automatic modulation classification. We have shown that transformer-based radio signal classification is vulnerable to imperceptible and carefully crafted attacks called adversarial examples. Therefore, we propose a defense system against adversarial examples in transformer-based modulation classifications. Considering the need for computationally efficient architecture particularly for Internet of Things (IoT)-based applications or operation of devices in environment where power supply is limited, we propose a compact transformer for modulation classification. The advantages of robust training such as adversarial training in transformers may not be attainable in compact transformers. By demonstrating this, we propose a novel compact transformer that can enhance robustness in the presence of adversarial attacks. The new method is aimed at transferring the adversarial attention map from the robustly trained large transformer to a compact transformer. The proposed method outperforms the state-of-the-art techniques for the considered white-box scenarios including fast gradient method and projected gradient descent attacks. We have provided reasoning of the underlying working mechanisms and investigated the transferability of the adversarial examples between different architectures. The proposed method has the potential to protect the transformer from the transferability of adversarial examples.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Authors:
Hwiwon Lee,
Ziqi Zhang,
Hanxiao Lu,
Lingming Zhang
Abstract:
Rigorous security-focused evaluation of large language model (LLM) agents is imperative for establishing trust in their safe deployment throughout the software development lifecycle. However, existing benchmarks largely rely on synthetic challenges or simplified vulnerability datasets that fail to capture the complexity and ambiguity encountered by security engineers in practice. We introduce SEC-…
▽ More
Rigorous security-focused evaluation of large language model (LLM) agents is imperative for establishing trust in their safe deployment throughout the software development lifecycle. However, existing benchmarks largely rely on synthetic challenges or simplified vulnerability datasets that fail to capture the complexity and ambiguity encountered by security engineers in practice. We introduce SEC-bench, the first fully automated benchmarking framework for evaluating LLM agents on authentic security engineering tasks. SEC-bench employs a novel multi-agent scaffold that automatically constructs code repositories with harnesses, reproduces vulnerabilities in isolated environments, and generates gold patches for reliable evaluation. Our framework automatically creates high-quality software vulnerability datasets with reproducible artifacts at a cost of only $0.87 per instance. Using SEC-bench, we implement two critical software security tasks to rigorously evaluate LLM agents' capabilities: proof-of-concept (PoC) generation and vulnerability patching. A comprehensive evaluation of state-of-the-art LLM code agents reveals significant performance gaps, achieving at most 18.0% success in PoC generation and 34.0% in vulnerability patching on our complete dataset. These results highlight the crucial steps needed toward developing LLM agents that are more practical, intelligent, and autonomous for security engineering.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
KEENHash: Hashing Programs into Function-Aware Embeddings for Large-Scale Binary Code Similarity Analysis
Authors:
Zhijie Liu,
Qiyi Tang,
Sen Nie,
Shi Wu,
Liang Feng Zhang,
Yutian Tang
Abstract:
Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search)…
▽ More
Binary code similarity analysis (BCSA) is a crucial research area in many fields such as cybersecurity. Specifically, function-level diffing tools are the most widely used in BCSA: they perform function matching one by one for evaluating the similarity between binary programs. However, such methods need a high time complexity, making them unscalable in large-scale scenarios (e.g., 1/n-to-n search). Towards effective and efficient program-level BCSA, we propose KEENHash, a novel hashing approach that hashes binaries into program-level representations through large language model (LLM)-generated function embeddings. KEENHash condenses a binary into one compact and fixed-length program embedding using K-Means and Feature Hashing, allowing us to do effective and efficient large-scale program-level BCSA, surpassing the previous state-of-the-art methods. The experimental results show that KEENHash is at least 215 times faster than the state-of-the-art function matching tools while maintaining effectiveness. Furthermore, in a large-scale scenario with 5.3 billion similarity evaluations, KEENHash takes only 395.83 seconds while these tools will cost at least 56 days. We also evaluate KEENHash on the program clone search of large-scale BCSA across extensive datasets in 202,305 binaries. Compared with 4 state-of-the-art methods, KEENHash outperforms all of them by at least 23.16%, and displays remarkable superiority over them in the large-scale BCSA security scenario of malware detection.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
A Causal Lens for Learning Long-term Fair Policies
Authors:
Jacob Lear,
Lu Zhang
Abstract:
Fairness-aware learning studies the development of algorithms that avoid discriminatory decision outcomes despite biased training data. While most studies have concentrated on immediate bias in static contexts, this paper highlights the importance of investigating long-term fairness in dynamic decision-making systems while simultaneously considering instantaneous fairness requirements. In the cont…
▽ More
Fairness-aware learning studies the development of algorithms that avoid discriminatory decision outcomes despite biased training data. While most studies have concentrated on immediate bias in static contexts, this paper highlights the importance of investigating long-term fairness in dynamic decision-making systems while simultaneously considering instantaneous fairness requirements. In the context of reinforcement learning, we propose a general framework where long-term fairness is measured by the difference in the average expected qualification gain that individuals from different groups could obtain.Then, through a causal lens, we decompose this metric into three components that represent the direct impact, the delayed impact, as well as the spurious effect the policy has on the qualification gain. We analyze the intrinsic connection between these components and an emerging fairness notion called benefit fairness that aims to control the equity of outcomes in decision-making. Finally, we develop a simple yet effective approach for balancing various fairness notions.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Magnetosonic Waves as a Driver of Observed Temperature Fluctuation Patterns in AGN Accretion Disks
Authors:
Ish Kaul,
Omer Blaes,
Yan-Fei Jiang,
Lizhong Zhang
Abstract:
Recent observations have revealed slow, coherent temperature fluctuations in AGN disks that propagate both inward and outward at velocities of $\sim 0.01 - 0.1c$, a kind of variability that is distinct from reverberation (mediated by the reprocessing of light) between different regions of the disk. We investigate the origin and nature of these fluctuations using global 3D radiation-magnetohydrodyn…
▽ More
Recent observations have revealed slow, coherent temperature fluctuations in AGN disks that propagate both inward and outward at velocities of $\sim 0.01 - 0.1c$, a kind of variability that is distinct from reverberation (mediated by the reprocessing of light) between different regions of the disk. We investigate the origin and nature of these fluctuations using global 3D radiation-magnetohydrodynamic simulations of radiation and magnetic pressure-dominated AGN accretion disks. Disks with a significant turbulent Maxwell stress component exhibit wave-like temperature perturbations, most evident close to the midplane, whose propagation speeds exactly match the local fast magnetosonic speed and are consistent with the speeds inferred in observations. These fluctuations have amplitudes of $2 - 4\%$ in gas temperature, which are also consistent with observational constraints. Disks that are dominated by mean-field Maxwell stresses do not exhibit such waves. While waves may be present in the body of the disk, we do not find them to be present in the photosphere. Although this may in part be due to low numerical resolution in the photosphere region, we discuss the physical challenges that must be overcome for the waves to manifest there. In particular, the fact that such waves are observed implies that the disk photospheres must be magnetically dominated, since radiative damping from photon diffusion smooths out radiation pressure fluctuations. Furthermore, the gas and radiation fluctuations must be out of local thermodynamic equilibrium.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
Authors:
Songyang Liu,
Chaozhuo Li,
Jiameng Qiu,
Xi Zhang,
Feiran Huang,
Litian Zhang,
Yiming Hei,
Philip S. Yu
Abstract:
With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP), including areas such as content generation, human-computer interaction, machine translation, and code generation, among others. However, their widespread deployment has also raised significant safety concerns. In re…
▽ More
With the rapid advancement of artificial intelligence technology, Large Language Models (LLMs) have demonstrated remarkable potential in the field of Natural Language Processing (NLP), including areas such as content generation, human-computer interaction, machine translation, and code generation, among others. However, their widespread deployment has also raised significant safety concerns. In recent years, LLM-generated content has occasionally exhibited unsafe elements like toxicity and bias, particularly in adversarial scenarios, which has garnered extensive attention from both academia and industry. While numerous efforts have been made to evaluate the safety risks associated with LLMs, there remains a lack of systematic reviews summarizing these research endeavors. This survey aims to provide a comprehensive and systematic overview of recent advancements in LLMs safety evaluation, focusing on several key aspects: (1) "Why evaluate" that explores the background of LLMs safety evaluation, how they differ from general LLMs evaluation, and the significance of such evaluation; (2) "What to evaluate" that examines and categorizes existing safety evaluation tasks based on key capabilities, including dimensions such as toxicity, robustness, ethics, bias and fairness, truthfulness, and so on; (3) "Where to evaluate" that summarizes the evaluation metrics, datasets and benchmarks currently used in safety evaluations; (4) "How to evaluate" that reviews existing evaluation toolkit, and categorizing mainstream evaluation methods based on the roles of the evaluators. Finally, we identify the challenges in LLMs safety evaluation and propose potential research directions to promote further advancement in this field. We emphasize the importance of prioritizing LLMs safety evaluation to ensure the safe deployment of these models in real-world applications.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Two Birds with One Stone: Improving Factuality and Faithfulness of LLMs via Dynamic Interactive Subspace Editing
Authors:
Pengbo Wang,
Chaozhuo Li,
Chenxu Wang,
Liwen Zheng,
Litian Zhang,
Xi Zhang
Abstract:
LLMs have demonstrated unprecedented capabilities in natural language processing, yet their practical deployment remains hindered by persistent factuality and faithfulness hallucinations. While existing methods address these hallucination types independently, they inadvertently induce performance trade-offs, as interventions targeting one type often exacerbate the other. Through empirical and theo…
▽ More
LLMs have demonstrated unprecedented capabilities in natural language processing, yet their practical deployment remains hindered by persistent factuality and faithfulness hallucinations. While existing methods address these hallucination types independently, they inadvertently induce performance trade-offs, as interventions targeting one type often exacerbate the other. Through empirical and theoretical analysis of activation space dynamics in LLMs, we reveal that these hallucination categories share overlapping subspaces within neural representations, presenting an opportunity for concurrent mitigation. To harness this insight, we propose SPACE, a unified framework that jointly enhances factuality and faithfulness by editing shared activation subspaces. SPACE establishes a geometric foundation for shared subspace existence through dual-task feature modeling, then identifies and edits these subspaces via a hybrid probe strategy combining spectral clustering and attention head saliency scoring. Experimental results across multiple benchmark datasets demonstrate the superiority of our approach.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning
Authors:
Lan Zhang,
Marco Valentino,
Andre Freitas
Abstract:
Autoformalization plays a crucial role in formal mathematical reasoning by enabling the automatic translation of natural language statements into formal languages. While recent advances using large language models (LLMs) have shown promising results, methods for automatically evaluating autoformalization remain underexplored. As one moves to more complex domains (e.g., advanced mathematics), human…
▽ More
Autoformalization plays a crucial role in formal mathematical reasoning by enabling the automatic translation of natural language statements into formal languages. While recent advances using large language models (LLMs) have shown promising results, methods for automatically evaluating autoformalization remain underexplored. As one moves to more complex domains (e.g., advanced mathematics), human evaluation requires significant time and domain expertise, especially as the complexity of the underlying statements and background knowledge increases. LLM-as-a-judge presents a promising approach for automating such evaluation. However, existing methods typically employ coarse-grained and generic evaluation criteria, which limit their effectiveness for advanced formal mathematical reasoning, where quality hinges on nuanced, multi-granular dimensions. In this work, we take a step toward addressing this gap by introducing a systematic, automatic method to evaluate autoformalization tasks. The proposed method is based on an epistemically and formally grounded ensemble (EFG) of LLM judges, defined on criteria encompassing logical preservation (LP), mathematical consistency (MC), formal validity (FV), and formal quality (FQ), resulting in a transparent assessment that accounts for different contributing factors. We validate the proposed framework to serve as a proxy for autoformalization assessment within the domain of formal mathematics. Overall, our experiments demonstrate that the EFG ensemble of LLM judges is a suitable emerging proxy for evaluation, more strongly correlating with human assessments than a coarse-grained model, especially when assessing formal qualities. These findings suggest that LLM-as-judges, especially when guided by a well-defined set of atomic properties, could offer a scalable, interpretable, and reliable support for evaluating formal mathematical reasoning.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Authors:
Qingyan Wei,
Yaojie Zhang,
Zhiyuan Liu,
Dongrui Liu,
Linfeng Zhang
Abstract:
Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. I…
▽ More
Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility. In this paper, we propose SlowFast Sampling, a novel dynamic sampling strategy that adaptively alternates between exploratory and accelerated decoding stages. Our method is guided by three golden principles: certainty principle, convergence principle, and positional principle, which govern when and where tokens can be confidently and efficiently decoded. We further integrate our strategy with dLLM-Cache to reduce redundant computation. Extensive experiments across benchmarks and models show that SlowFast Sampling achieves up to 15.63$\times$ speedup on LLaDA with minimal accuracy drop, and up to 34.22$\times$ when combined with caching. Notably, our approach outperforms strong autoregressive baselines like LLaMA3 8B in throughput, demonstrating that well-designed sampling can unlock the full potential of dLLMs for fast and high-quality generation.
△ Less
Submitted 12 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks
Authors:
Xiaoxing Mo,
Yuxuan Cheng,
Nan Sun,
Leo Yu Zhang,
Wei Luo,
Shang Gao
Abstract:
Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers implant hidden triggers during training to maliciously control model behavior. Topological Evolution Dynamics (TED) has recently emerged as a powerful tool for detecting backdoor attacks in DNNs. However, TED can be vulnerable to backdoor attacks that adaptively distort topological representation distributions across n…
▽ More
Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers implant hidden triggers during training to maliciously control model behavior. Topological Evolution Dynamics (TED) has recently emerged as a powerful tool for detecting backdoor attacks in DNNs. However, TED can be vulnerable to backdoor attacks that adaptively distort topological representation distributions across network layers. To address this limitation, we propose TED-LaST (Topological Evolution Dynamics against Laundry, Slow release, and Target mapping attack strategies), a novel defense strategy that enhances TED's robustness against adaptive attacks. TED-LaST introduces two key innovations: label-supervised dynamics tracking and adaptive layer emphasis. These enhancements enable the identification of stealthy threats that evade traditional TED-based defenses, even in cases of inseparability in topological space and subtle topological perturbations. We review and classify data poisoning tricks in state-of-the-art adaptive attacks and propose enhanced adaptive attack with target mapping, which can dynamically shift malicious tasks and fully leverage the stealthiness that adaptive attacks possess. Our comprehensive experiments on multiple datasets (CIFAR-10, GTSRB, and ImageNet100) and model architectures (ResNet20, ResNet101) show that TED-LaST effectively counteracts sophisticated backdoors like Adap-Blend, Adapt-Patch, and the proposed enhanced adaptive attack. TED-LaST sets a new benchmark for robust backdoor detection, substantially enhancing DNN security against evolving threats.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length
Authors:
Junhang Cheng,
Fang Liu,
Chengru Wu,
Li Zhang
Abstract:
While Large Language Models (LLMs) have significantly advanced code generation efficiency, they face inherent challenges in balancing performance and inference costs across diverse programming tasks. Dynamically selecting the optimal LLM based on task difficulty and resource constraints offers a promising approach to achieve an optimal balance between efficiency and performance. However, existing…
▽ More
While Large Language Models (LLMs) have significantly advanced code generation efficiency, they face inherent challenges in balancing performance and inference costs across diverse programming tasks. Dynamically selecting the optimal LLM based on task difficulty and resource constraints offers a promising approach to achieve an optimal balance between efficiency and performance. However, existing model selection methods are resource-intensive and often neglect cost efficiency. Moreover, these approaches rely on human-annotated difficulty labels that are frequently inaccessible in real-world settings and may not align with the LLM's own assessment of task difficulty. In this paper, we introduce AdaptiveLLM, a framework that dynamically selects optimal LLMs for a given coding task by automatically assessing task difficulty. Our framework first estimates task difficulty using Chain-of-Thought lengths generated by reasoning model, clusters these into three difficulty levels via k-means, and fine-tunes CodeBERT to embed difficulty-aware features. A trained XGBoost classifier then selects the best model for each problem, optimizing the performance-cost trade-off. Experimental results show that AdaptiveLLM achieves a 7.86% improvement in pass@1 score while reducing resource consumption by 88.9% compared to baseline method ComplexityNet. When compared to a single model, AdaptiveLLM demonstrates an approximately 15% accuracy improvement, while maintaining the same level of cost consumption. Apart from that, the difficulty assessment using CoT provides more reliable selection criteria than human evaluation. Our replication package is available at https://github.com/cjhCoder7/AdaptiveLLM.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
On the Law of the Iterated Logarithm for m-dependent stationary random variables under sub-linear expectations
Authors:
Wang-Yun Gu,
Li-Xin Zhang
Abstract:
This paper explores the Law of the Iterated Logarithm (LIL) for $m$-dependent sequences under the framework of sub-linear expectations. We first extend existing LIL results to sequences of independent, non-identically distributed random variables under sub-linear expectations. This extension serves as a crucial intermediary step, facilitating the subsequent establishment of the LIL for $m$-depende…
▽ More
This paper explores the Law of the Iterated Logarithm (LIL) for $m$-dependent sequences under the framework of sub-linear expectations. We first extend existing LIL results to sequences of independent, non-identically distributed random variables under sub-linear expectations. This extension serves as a crucial intermediary step, facilitating the subsequent establishment of the LIL for $m$-dependent stationary sequences. On the other hand, we also establish necessary conditions for $m$-dependent sequences in sub-linear expectation spaces.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Study of Stability and Consistency of EAS Thermal Neutron Detection at ENDA-64
Authors:
Heng-Yu Zhang,
Xin-Hua Ma,
Tian-Lu Chen,
Shu-Wang Cui,
Danzengluobu,
Wei Gao,
Wen-Chao Gao,
Xin-Rui Gao,
Zi-Ao Gong,
Hai-Bing Hu,
Denis Kuleshov,
Kirill Kurinov,
Bing-Bing Li,
Fan-Ping Li,
Jia-Heng Li,
Yang Li,
Hu Liu,
Mao-Yuan Liu,
Ye Liu,
Xi-An Pan,
Da-Yu Peng,
Yao-Hui Qi,
Dong Qu,
Oleg Shchegolev,
Yuri Stenkin
, et al. (5 additional authors not shown)
Abstract:
Introduction:Electron-Neutron Detector Array (ENDA) is designed to measure thermal neutrons produced by hadronic interactions between cosmic ray extensive air showers (EAS) and the surrounding environment as well as electrons around the cores of EAS. ENDA is located within Large High Altitude Air Shower Observatory (LHAASO). ENDA was expanded from an initial 16 detectors to 64 detectors in April 2…
▽ More
Introduction:Electron-Neutron Detector Array (ENDA) is designed to measure thermal neutrons produced by hadronic interactions between cosmic ray extensive air showers (EAS) and the surrounding environment as well as electrons around the cores of EAS. ENDA is located within Large High Altitude Air Shower Observatory (LHAASO). ENDA was expanded from an initial 16 detectors to 64 detectors in April 2023, so called ENDA-64, and has been running alongside LHAASO. The stability and consistency of neutron detection are crucial for laying a solid foundation for subsequent data analysis and physical results. Methods:We obtain the stability by studying variations of event rate and thermal neutron rate in each cluster and the consistency by comparing distribution of number of thermal neutrons between clusters. Additionally, we investigate the specific influences of the rainy and dry seasons, as well as the presence or absence of sand cubes under the detectors, to examine the environmental factors affecting neutron measurement performance. Results:The calibration results indicate good consistency in thermal neutron detection across the clusters, with the maximum inconsistency of 6.85%. The maximum instability of event rate and thermal neutron rate over time are 4.68% and 11.0% respectively. The maximum inconsistency between the clusters without the sand cubes is 18%. The use of sand cubes is effective in protecting the target material from rainwater, and the sand cubes help the cluster to increase collection of neutrons generated by EAS events.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Realization of Weyl elastic metamaterials with spin skyrmions
Authors:
Yuang Pan,
Liang Si,
Miao Yang,
Ning Han,
Li Zhang,
Qiaolu Chen,
Rui Zhao,
Fujia Chen,
Yudong Ren,
Wenhao Li,
Yuze Hu,
Mingyu Tong,
Xinrui Li,
Junyao Wu,
Ronghao Bao,
Weiqiu Chen,
Yang Long,
Bin Wu,
Hongsheng Chen,
Baile Zhang,
Yihao Yang
Abstract:
Topological elastic metamaterials provide a topologically robust way to manipulate the phononic energy and information beyond the conventional approaches. Among various topological elastic metamaterials, Weyl elastic metamaterials stand out, as they are unique to three dimensions and exhibit numerous intriguing phenomena and potential applications. To date, however, the realization of Weyl elastic…
▽ More
Topological elastic metamaterials provide a topologically robust way to manipulate the phononic energy and information beyond the conventional approaches. Among various topological elastic metamaterials, Weyl elastic metamaterials stand out, as they are unique to three dimensions and exhibit numerous intriguing phenomena and potential applications. To date, however, the realization of Weyl elastic metamaterials remains elusive, primarily due to the full-vectoral nature of elastic waves and the complicated couplings between polarizations, leading to complicated and tangled three-dimensional (3D) bandstructures that unfavorable for experimental demonstration. Here, we overcome the challenge and realize an ideal, 3D printed, all-metallic Weyl elastic metamaterial with low dissipation losses. Notably, the elastic spin of the excitations around the Weyl points exhibits skyrmion textures, a topologically stable structure in real space. Utilizing 3D laser vibrometry, we reveal the projection of the Weyl points, the Fermi arcs and the unique spin characteristics of the topological surface states. Our work extends the Weyl metamaterials to elastic waves and paves a topological way to robust manipulation of elastic waves in 3D space.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Minimizing False Positives in Static Bug Detection via LLM-Enhanced Path Feasibility Analysis
Authors:
Xueying Du,
Kai Yu,
Chong Wang,
Yi Zou,
Wentai Deng,
Zuoyu Ou,
Xin Peng,
Lingming Zhang,
Yiling Lou
Abstract:
Static bug analyzers play a crucial role in ensuring software quality. However, existing analyzers for bug detection in large codebases often suffer from high false positive rates. This is primarily due to the limited capabilities of analyzers in path feasibility validation with multiple conditional branches and complex data dependencies. While current LLM-based approaches attempt to address this…
▽ More
Static bug analyzers play a crucial role in ensuring software quality. However, existing analyzers for bug detection in large codebases often suffer from high false positive rates. This is primarily due to the limited capabilities of analyzers in path feasibility validation with multiple conditional branches and complex data dependencies. While current LLM-based approaches attempt to address this issue, their effectiveness remains limited due to insufficient constraint cascade analysis and scalability challenges in large projects. To address this challenge, we propose an iterative path feasibility analysis framework LLM4PFA. By leveraging LLM agent based targeted constraint reasoning, and key context-aware analysis driven by agent planning, LLM4PFA effectively enhances complex inter-procedural path feasibility analysis for minimizing false positives in static bug detection. Evaluation results show that LLM4PFA precisely filters out 72% to 96% false positives reported during static bug detection, significantly outperforming all the baselines by 41.1% - 105.7% improvements; meanwhile LLM4PFA only misses 3 real bugs of 45 true positives.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Search for sub-GeV invisible particles in inclusive decays of $J/ψ$ to $φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (704 additional authors not shown)
Abstract:
A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the…
▽ More
A search for an invisible particle, $X$, with a mass between 0 and 0.96 $\textrm{GeV}/\textit{c}^{2}$, is performed in the process $J/ψ\rightarrowφ+ X$ using $(8774.0\pm39.4)\times10^{6}$ $J/ψ$ events collected with the BESIII detector from 2017 to 2019. The $φ$ meson is fully reconstructed and an efficient veto of photons, neutral and charged hadrons up to twice the $K_L^0$ mass is applied to the rest of the events, and the recoil mass against the $φ$ is obtained precisely from the kinematic constraint in the event. No significant signal is observed in the investigated region and the upper limit on the inclusive branching fraction of $J/ψ\rightarrowφ+ X$ is determined to be $7.5\times10^{-8}$ at 90% confidence level. Upper limits at a 90% confidence level are also given for this branching fraction as a function of the invisible particle mass, varying from $9\times10^{-9}$ to $4\times10^{-8}$ over the investigated mass range. Additionally, a 90% confidence level upper limit on the branching fraction of $η\rightarrow \rm{invisible}$ is determined to $2.6\times10^{-5}$, which improves the previous best results by more than four times. The analysis technique in this work offers a clean window to search for sub-GeV invisible particles, which can be adapted for other $J/ψ$ decays and direct $e^+e^-$ annihilation experiments in future studies, and improve the sensitivity by orders of magnitude.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models
Authors:
Yantai Yang,
Yuhao Wang,
Zichen Wen,
Luo Zhongwei,
Chang Zou,
Zhipeng Zhang,
Chuan Wen,
Linfeng Zhang
Abstract:
Vision-Language-Action (VLA) models, particularly diffusion-based architectures, demonstrate transformative potential for embodied intelligence but are severely hampered by high computational and memory demands stemming from extensive inherent and inference-time redundancies. While existing acceleration efforts often target isolated inefficiencies, such piecemeal solutions typically fail to holist…
▽ More
Vision-Language-Action (VLA) models, particularly diffusion-based architectures, demonstrate transformative potential for embodied intelligence but are severely hampered by high computational and memory demands stemming from extensive inherent and inference-time redundancies. While existing acceleration efforts often target isolated inefficiencies, such piecemeal solutions typically fail to holistically address the varied computational and memory bottlenecks across the entire VLA pipeline, thereby limiting practical deployability. We introduce EfficientVLA, a structured and training-free inference acceleration framework that systematically eliminates these barriers by cohesively exploiting multifaceted redundancies. EfficientVLA synergistically integrates three targeted strategies: (1) pruning of functionally inconsequential layers from the language module, guided by an analysis of inter-layer redundancies; (2) optimizing the visual processing pathway through a task-aware strategy that selects a compact, diverse set of visual tokens, balancing task-criticality with informational coverage; and (3) alleviating temporal computational redundancy within the iterative diffusion-based action head by strategically caching and reusing key intermediate features. We apply our method to a standard VLA model CogACT, yielding a 1.93X inference speedup and reduces FLOPs to 28.9%, with only a 0.6% success rate drop in the SIMPLER benchmark.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Urban1960SatSeg: Unsupervised Semantic Segmentation of Mid-20$^{th}$ century Urban Landscapes with Satellite Imageries
Authors:
Tianxiang Hao,
Lixian Zhang,
Yingjia Zhang,
Mengxuan Chen,
Jinxiao Zhang,
Haohuan Fu
Abstract:
Historical satellite imagery, such as mid-20$^{th}$ century Keyhole data, offers rare insights into understanding early urban development and long-term transformation. However, severe quality degradation (e.g., distortion, misalignment, and spectral scarcity) and annotation absence have long hindered semantic segmentation on such historical RS imagery. To bridge this gap and enhance understanding…
▽ More
Historical satellite imagery, such as mid-20$^{th}$ century Keyhole data, offers rare insights into understanding early urban development and long-term transformation. However, severe quality degradation (e.g., distortion, misalignment, and spectral scarcity) and annotation absence have long hindered semantic segmentation on such historical RS imagery. To bridge this gap and enhance understanding of urban development, we introduce $\textbf{Urban1960SatBench}$, an annotated segmentation dataset based on historical satellite imagery with the earliest observation time among all existing segmentation datasets, along with a benchmark framework for unsupervised segmentation tasks, $\textbf{Urban1960SatUSM}$. First, $\textbf{Urban1960SatBench}$ serves as a novel, expertly annotated semantic segmentation dataset built on mid-20$^{th}$ century Keyhole imagery, covering 1,240 km$^2$ and key urban classes (buildings, roads, farmland, water). As the earliest segmentation dataset of its kind, it provides a pioneering benchmark for historical urban understanding. Second, $\textbf{Urban1960SatUSM}$(Unsupervised Segmentation Model) is a novel unsupervised semantic segmentation framework for historical RS imagery. It employs a confidence-aware alignment mechanism and focal-confidence loss based on a self-supervised learning architecture, which generates robust pseudo-labels and adaptively prioritizes prediction difficulty and label reliability to improve unsupervised segmentation on noisy historical data without manual supervision. Experiments show Urban1960SatUSM significantly outperforms existing unsupervised segmentation methods on Urban1960SatSeg for segmenting historical urban scenes, promising in paving the way for quantitative studies of long-term urban change using modern computer vision. Our benchmark and supplementary material are available at https://github.com/Tianxiang-Hao/Urban1960SatSeg.
△ Less
Submitted 12 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
Search for the charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (705 additional authors not shown)
Abstract:
Based on $(10087\pm44)\times 10^6$ $J/ψ$ events recorded with the BESIII detector, we search for the rare charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$ No signal is observed, and upper limits on the branching fractions at the $90\%$ confidence level are set as $\mathcal{B}(J/ψ\to D_{s}^{-}ρ^{+}+c.c.)<8.0\times10^{-7}$ and…
▽ More
Based on $(10087\pm44)\times 10^6$ $J/ψ$ events recorded with the BESIII detector, we search for the rare charmonium weak decays $J/ψ\to D_{s}^{-}ρ^{+}+c.c.$ and $J/ψ\to D_{s}^{-}π^{+}+c.c.$ No signal is observed, and upper limits on the branching fractions at the $90\%$ confidence level are set as $\mathcal{B}(J/ψ\to D_{s}^{-}ρ^{+}+c.c.)<8.0\times10^{-7}$ and $\mathcal{B}(J/ψ\to D_{s}^{-}π^{+}+c.c.)<4.1\times10^{-7}$. Our results provide the most stringent experimental constraints on these decays.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Putative excitonic insulating state in narrow-gap semiconductor La$_3$Cd$_2$As$_6$
Authors:
Caitlin S. Kengle,
Noah Schnitzer,
Elizabeth A. Peterson,
Chunyu Guo,
Ling Zhang,
Matthew S. Cook,
Jian-Xin Zhu,
Sean M. Thomas,
Philip J. W. Moll,
Filip Ronning,
Priscila F. S. Rosa
Abstract:
Excitonic insulators are electronically-driven phases of matter characterized by the spontaneous condensation of electron-hole pairs. Here we show that La$_3$Cd$_2$As$_6$ undergoes a transition at $T_{0}=278$ K to a highly insulating state with no accompanying structural transition. We observe quasi-two-dimensional electrical transport and charge fluctuations consistent with an electronic transiti…
▽ More
Excitonic insulators are electronically-driven phases of matter characterized by the spontaneous condensation of electron-hole pairs. Here we show that La$_3$Cd$_2$As$_6$ undergoes a transition at $T_{0}=278$ K to a highly insulating state with no accompanying structural transition. We observe quasi-two-dimensional electrical transport and charge fluctuations consistent with an electronic transition enabled by enhanced Coulomb interactions. Density functional theory calculations are unable to replicate the insulating ground state. Our results support the opening of a gap by excitonic effects at $T_{0}$, placing La$_3$Cd$_2$As$_6$ as a rare example of a bulk excitonic insulator.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
Authors:
Lei Zhang,
Jiaxi Yang,
Min Yang,
Jian Yang,
Mouxiang Chen,
Jiajun Zhang,
Zeyu Cui,
Binyuan Hui,
Junyang Lin
Abstract:
We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD). Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements. The core of **SWE-Flow** is the construction of a Runtime Dependency Graph…
▽ More
We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD). Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements. The core of **SWE-Flow** is the construction of a Runtime Dependency Graph (RDG), which precisely captures function interactions, enabling the generation of a structured, step-by-step *development schedule*. At each step, **SWE-Flow** produces a partial codebase, the corresponding unit tests, and the necessary code modifications, resulting in fully verifiable TDD tasks. With this approach, we generated 16,061 training instances and 2,020 test instances from real-world GitHub projects, creating the **SWE-Flow-Eval** benchmark. Our experiments show that fine-tuning open model on this dataset significantly improves performance in TDD-based coding. To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow).
△ Less
Submitted 10 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Rethinking Range-View LiDAR Segmentation in Adverse Weather
Authors:
Longyu Yang,
Ping Hu,
Lu Zhang,
Jun Liu,
Yap-Peng Tan,
Heng Tao Shen,
Xiaofeng Zhu
Abstract:
LiDAR segmentation has emerged as an important task to enrich multimedia experiences and analysis. Range-view-based methods have gained popularity due to their high computational efficiency and compatibility with real-time deployment. However, their generalized performance under adverse weather conditions remains underexplored, limiting their reliability in real-world environments. In this work, w…
▽ More
LiDAR segmentation has emerged as an important task to enrich multimedia experiences and analysis. Range-view-based methods have gained popularity due to their high computational efficiency and compatibility with real-time deployment. However, their generalized performance under adverse weather conditions remains underexplored, limiting their reliability in real-world environments. In this work, we identify and analyze the unique challenges that affect the generalization of range-view LiDAR segmentation in severe weather. To address these challenges, we propose a modular and lightweight framework that enhances robustness without altering the core architecture of existing models. Our method reformulates the initial stem block of standard range-view networks into two branches to process geometric attributes and reflectance intensity separately. Specifically, a Geometric Abnormality Suppression (GAS) module reduces the influence of weather-induced spatial noise, and a Reflectance Distortion Calibration (RDC) module corrects reflectance distortions through memory-guided adaptive instance normalization. The processed features are then fused and passed to the original segmentation pipeline. Extensive experiments on different benchmarks and baseline models demonstrate that our approach significantly improves generalization to adverse weather with minimal inference overhead, offering a practical and effective solution for real-world LiDAR segmentation.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping
Authors:
Jiajun Li,
Yue Ma,
Xinyu Zhang,
Qingyan Wei,
Songhua Liu,
Linfeng Zhang
Abstract:
Recent studies on Visual Autoregressive (VAR) models have highlighted that high-frequency components, or later steps, in the generation process contribute disproportionately to inference latency. However, the underlying computational redundancy involved in these steps has yet to be thoroughly investigated. In this paper, we conduct an in-depth analysis of the VAR inference process and identify two…
▽ More
Recent studies on Visual Autoregressive (VAR) models have highlighted that high-frequency components, or later steps, in the generation process contribute disproportionately to inference latency. However, the underlying computational redundancy involved in these steps has yet to be thoroughly investigated. In this paper, we conduct an in-depth analysis of the VAR inference process and identify two primary sources of inefficiency: step redundancy and unconditional branch redundancy. To address step redundancy, we propose an automatic step-skipping strategy that selectively omits unnecessary generation steps to improve efficiency. For unconditional branch redundancy, we observe that the information gap between the conditional and unconditional branches is minimal. Leveraging this insight, we introduce unconditional branch replacement, a technique that bypasses the unconditional branch to reduce computational cost. Notably, we observe that the effectiveness of acceleration strategies varies significantly across different samples. Motivated by this, we propose SkipVAR, a sample-adaptive framework that leverages frequency information to dynamically select the most suitable acceleration strategy for each instance. To evaluate the role of high-frequency information, we introduce high-variation benchmark datasets that test model sensitivity to fine details. Extensive experiments show SkipVAR achieves over 0.88 average SSIM with up to 1.81x overall acceleration and 2.62x speedup on the GenEval benchmark, maintaining model quality. These results confirm the effectiveness of frequency-aware, training-free adaptive acceleration for scalable autoregressive image generation. Our code is available at https://github.com/fakerone-li/SkipVAR and has been publicly released.
△ Less
Submitted 10 July, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Identifying vortex lattice in type-II superconductors via the dynamic magnetostrictive effect
Authors:
Peipei Lu,
Mengju Yuan,
Jing Zhang,
Qiang Gao,
Shuang Liu,
Yugang Zhang,
Shipeng Shen,
Long Zhang,
Jun Lu,
Xiaoyuan Zhou,
Mingquan He,
Aifeng Wang,
Yang Li,
Wenshan Hong,
Shiliang Li,
Huiqian Luo,
Xingjiang Zhou,
Xianhui Chen,
Young Sun,
Yisheng Chai
Abstract:
In type-I superconductors, zero electrical resistivity and perfect diamagnetism define two fundamental criteria for superconducting behavior. In contrast, type-II superconductors exhibit more complex mixed-state physics, where magnetic flux penetrates the material above the lower critical field Hc1 in the form of quantized vortices, each carrying a single flux quantum. These vortices form a two-di…
▽ More
In type-I superconductors, zero electrical resistivity and perfect diamagnetism define two fundamental criteria for superconducting behavior. In contrast, type-II superconductors exhibit more complex mixed-state physics, where magnetic flux penetrates the material above the lower critical field Hc1 in the form of quantized vortices, each carrying a single flux quantum. These vortices form a two-dimensional lattice which persists up to another irreversible field (Hirr) and then melts into a dissipative liquid phase. The vortex lattice is fundamental to the magnetic and electrical properties of type-II superconductors, a third definitive criterion-beyond resistivity and magnetization-for identifying this phase has remained elusive. Here, we report the discovery of a dynamic magnetostrictive effect, wherein the geometry of the superconductor oscillates only under an applied alternating magnetic field due to the disturbance of the vortex lattice. This effect is detected by a thin piezoelectric transducer, which converts the excited geometric deformation into an in-phase ac voltage. Notably, we find a direct and nearly linear relationship between the signal amplitude and the vortex density in lattice across several representative type-II superconductors. In the vortex liquid phase above Hirr, the signal amplitude rapidly decays to zero near the upper critical field (Hc2), accompanied by a pronounced out-of-phase component due to enhanced dissipation. This dynamic magnetostrictive effect not only reveals an unexplored magnetoelastic property of the vortex lattice but also establishes a fundamental criterion for identifying the type-II superconductors.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
Authors:
Shuyi Zhang,
Xiaoshuai Hao,
Yingbo Tang,
Lingfeng Zhang,
Pengwei Wang,
Zhongyuan Wang,
Hongxuan Ma,
Shanghang Zhang
Abstract:
Video content comprehension is essential for various applications, ranging from video analysis to interactive systems. Despite advancements in large-scale vision-language models (VLMs), these models often struggle to capture the nuanced, spatiotemporal details essential for thorough video analysis. To address this gap, we introduce Video-CoT, a groundbreaking dataset designed to enhance spatiotemp…
▽ More
Video content comprehension is essential for various applications, ranging from video analysis to interactive systems. Despite advancements in large-scale vision-language models (VLMs), these models often struggle to capture the nuanced, spatiotemporal details essential for thorough video analysis. To address this gap, we introduce Video-CoT, a groundbreaking dataset designed to enhance spatiotemporal understanding using Chain-of-Thought (CoT) methodologies. Video-CoT contains 192,000 fine-grained spa-tiotemporal question-answer pairs and 23,000 high-quality CoT-annotated samples, providing a solid foundation for evaluating spatiotemporal understanding in video comprehension. Additionally, we provide a comprehensive benchmark for assessing these tasks, with each task featuring 750 images and tailored evaluation metrics. Our extensive experiments reveal that current VLMs face significant challenges in achieving satisfactory performance, high-lighting the difficulties of effective spatiotemporal understanding. Overall, the Video-CoT dataset and benchmark open new avenues for research in multimedia understanding and support future innovations in intelligent systems requiring advanced video analysis capabilities. By making these resources publicly available, we aim to encourage further exploration in this critical area. Project website:https://video-cot.github.io/ .
△ Less
Submitted 12 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Improved LLM Agents for Financial Document Question Answering
Authors:
Nelvin Tan,
Zian Seng,
Liang Zhang,
Yu-Ching Shih,
Dong Yang,
Amol Salunkhe
Abstract:
Large language models (LLMs) have shown impressive capabilities on numerous natural language processing tasks. However, LLMs still struggle with numerical question answering for financial documents that include tabular and textual data. Recent works have showed the effectiveness of critic agents (i.e., self-correction) for this task given oracle labels. Building upon this framework, this paper exa…
▽ More
Large language models (LLMs) have shown impressive capabilities on numerous natural language processing tasks. However, LLMs still struggle with numerical question answering for financial documents that include tabular and textual data. Recent works have showed the effectiveness of critic agents (i.e., self-correction) for this task given oracle labels. Building upon this framework, this paper examines the effectiveness of the traditional critic agent when oracle labels are not available, and show, through experiments, that this critic agent's performance deteriorates in this scenario. With this in mind, we present an improved critic agent, along with the calculator agent which outperforms the previous state-of-the-art approach (program-of-thought) and is safer. Furthermore, we investigate how our agents interact with each other, and how this interaction affects their performance.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Measurement of $ψ(2S)$ to $J/ψ$ cross-section ratio as function of multiplicity in $p$Pb collisions at$\sqrt{s_{NN}} = 8.16$ TeV
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An
, et al. (1137 additional authors not shown)
Abstract:
The production ratio of $ψ(2S)$ to $J/ψ$ charmonium states is presented as a function of multiplicity in proton-lead collisions at a centre-of-mass energy of $\sqrt{s_{NN}}=8.16$ TeV, for both prompt and nonprompt sources. The total luminosity recorded by the LHCb experiment corresponds to 13.6 $pb^{-1}$ for $p$Pb collisions and 20.8 $pb^{-1}$ for Pb$p$ collisions, where the first particle indicat…
▽ More
The production ratio of $ψ(2S)$ to $J/ψ$ charmonium states is presented as a function of multiplicity in proton-lead collisions at a centre-of-mass energy of $\sqrt{s_{NN}}=8.16$ TeV, for both prompt and nonprompt sources. The total luminosity recorded by the LHCb experiment corresponds to 13.6 $pb^{-1}$ for $p$Pb collisions and 20.8 $pb^{-1}$ for Pb$p$ collisions, where the first particle indicates the forward direction of the detector. Measurements are performed in the dimuon final state at forward (backward) centre-of-mass rapidity $1.5<y^*<4.0$ ($-5.0<y^*<-2.5$) for $p$Pb (Pb$p$) collisions.A multiplicity dependence of the prompt production ratio is observed in $p$Pb collisions, whereas no dependence is found in nonprompt production, nor in either prompt or nonprompt production in Pb$p$ collisions. These results suggest that in the Pb-going direction additional suppression mechanisms beyond comover effects may be present, possibly related to the formation of quark-gluon plasma. This highlights a transition from small to large collision systems and provides important insight into the suppression of charmonia in proton-nucleus collisions.
△ Less
Submitted 12 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Measurement of the $η$ transition form factor through $η' \rightarrow π^+π^-η$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and…
▽ More
Based on a sample of $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events collected at BESIII, the transition form factor of the $η$ meson is extracted by analyzing $J/ψ\toγη',~η'\toπ^+π^-η,~η\toγl^+l^-$ ($l$=$e$, $μ$) events. The measured slope of the transition form factor is $Λ^{-2}=1.645\pm0.093_{\rm stat.}\pm {0.024_{\rm sys.}}$ (GeV/$c^2$)$^{-2}$ for the di-electron channel and $Λ^{-2}=1.645\pm0.343_{\rm stat.}\pm0.017_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ for the di-muon channel. The branching fractions for $η\rightarrowγe^+e^-$ and $η\rightarrowγμ^+μ^-$ are measured to be $\mathcal{B}(η\toγe^+e^-)=(6.79\pm0.04_{\rm stat.}\pm0.36_{\rm sys.})\times 10^{-3}$ and $\mathcal{B}(η\toγμ^+μ^-)=(2.97\pm0.11_{\rm stat.}\pm0.07_{\rm sys.})\times 10^{-4}$. By combining with the results based on the $J/ψ\toγη,~η\toγe^+e^-$ events from the previous BESIII measurement, we determine $Λ^{-2}=1.707\pm0.076_{\rm stat.}\pm0.029_{\rm sys.}$ (GeV/$c^2$)$^{-2}$ and $\mathcal{B}(η\toγe^+e^-)=(6.93\pm0.28_{\rm tot.})\times 10^{-3}$. In addition, we search for the dark photon ($A'$) using the combined events. No significant signal is observed, and the upper limits on $\mathcal{B}(η\toγA',~A'\to e^+e^-)$ are set at 90\% confidence level for different $A'$ mass hypotheses.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
The Invariant Zonotopic Set-Membership Filter for State Estimation on Groups
Authors:
Tao Li,
Yi Li,
Lulin Zhang,
Jiuxiang Dong
Abstract:
The invariant filtering theory based on the group theory has been successful in statistical filtering methods. However, there exists a class of state estimation problems with unknown statistical properties of noise disturbances, and it is worth discussing whether the invariant observer still has performance advantages. In this paper, considering the problem of state estimation with unknown but bou…
▽ More
The invariant filtering theory based on the group theory has been successful in statistical filtering methods. However, there exists a class of state estimation problems with unknown statistical properties of noise disturbances, and it is worth discussing whether the invariant observer still has performance advantages. In this paper, considering the problem of state estimation with unknown but bounded noise disturbances, an Invariant Zonotopic Set-Membership Filter (InZSMF) method on groups is innovatively proposed, which extends the invariant filtering theory to the field of non-statistical filtering represented by set-membership filtering. Firstly, the InZSMF method transforms the state space from the traditional Euclidean vector space to the Lie group space to construct group affine discrete systems with unknown but bounded noise uncertainty defined by the zonotope on groups. Secondly, the nonlinear observer on the group is defined and the corresponding linearized estimation error is derived. Then, two observer gain tuning algorithms under the InZSMF method are proposed, respectively, the pole configuration method and the F-radius optimization method. Finally, through simulation experiments, it is shown that the InZSMF state estimation method is generally superior to the traditional Zonotopic Set-Membership Filter (ZSMF) state estimation method. Especially, when the initial estimations are imprecise, the convergence speed of state estimation, the accuracy of set-membership center estimation, and the average interval area of zonotopic estimation of the InZSMF method are significantly better than those of the ZSMF method.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration
Authors:
Weiya Li,
Junjie Chen,
Bei Li,
Boyang Liu,
Zichen Wen,
Nuanqiao Shan,
Xiaoqian Liu,
Anping Liu,
Huajie Liu,
Hu Song,
Linfeng Zhang
Abstract:
Machine translation has long been a central task in natural language processing. With the rapid advancement of large language models (LLMs), there has been remarkable progress in translation quality. However, fully realizing the translation potential of LLMs remains an open challenge. Recent studies have explored multi-agent systems to decompose complex translation tasks into collaborative subtask…
▽ More
Machine translation has long been a central task in natural language processing. With the rapid advancement of large language models (LLMs), there has been remarkable progress in translation quality. However, fully realizing the translation potential of LLMs remains an open challenge. Recent studies have explored multi-agent systems to decompose complex translation tasks into collaborative subtasks, showing initial promise in enhancing translation quality through agent cooperation and specialization. Nevertheless, existing multi-agent translation frameworks largely neglect foundational insights from cognitive translation studies. These insights emphasize how human translators employ different cognitive strategies, such as balancing literal and free translation, refining expressions based on context, and iteratively evaluating outputs. To address this limitation, we propose a cognitively informed multi-agent framework called TACTIC, which stands for T ranslation A gents with Cognitive- T heoretic Interactive Collaboration. The framework comprises six functionally distinct agents that mirror key cognitive processes observed in human translation behavior. These include agents for drafting, refinement, evaluation, scoring, context reasoning, and external knowledge gathering. By simulating an interactive and theory-grounded translation workflow, TACTIC effectively leverages the full capacity of LLMs for high-quality translation. Experimental results on diverse language pairs from the FLORES-200 and WMT24 benchmarks show that our method consistently achieves state-of-the-art performance. Using DeepSeek-V3 as the base model, TACTIC surpasses GPT-4.1 by an average of +0.6 XCOMET and +1.18 COMETKIWI-23. Compared to DeepSeek-R1, it further improves by +0.84 XCOMET and +2.99 COMETKIWI-23. Code is available at https://github.com/weiyali126/TACTIC.
△ Less
Submitted 11 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Observatory Science with eXTP
Authors:
Ping Zhou,
Jirong Mao,
Liang Zhang,
Alessandro Patruno,
Enrico Bozzo,
Yanjun Xu,
Andrea Santangelo,
Silvia Zane,
Shuang-Nan Zhang,
Hua Feng,
Yuri Cavecchi,
Barbara De Marco,
Junhui Fan,
Xian Hou,
Pengfei Jiang,
Patrizia Romano,
Gloria Sala,
Lian Tao,
Alexandra Veledina,
Jacco Vink,
Song Wang,
Junxian Wang,
Yidi Wang,
Shanshan Weng,
Qingwen Wu
, et al. (75 additional authors not shown)
Abstract:
Scheduled for launch in 2030, the enhanced X-ray Timing and Polarization (eXTP) telescope is a Chinese space-based mission aimed at studying extreme conditions and phenomena in astrophysics. eXTP will feature three main payloads: Spectroscopy Focusing Arrays (SFAs), Polarimetry Focusing Arrays (PFAs), and a Wide-field Camera (W2C). This white paper outlines observatory science, incorporating key s…
▽ More
Scheduled for launch in 2030, the enhanced X-ray Timing and Polarization (eXTP) telescope is a Chinese space-based mission aimed at studying extreme conditions and phenomena in astrophysics. eXTP will feature three main payloads: Spectroscopy Focusing Arrays (SFAs), Polarimetry Focusing Arrays (PFAs), and a Wide-field Camera (W2C). This white paper outlines observatory science, incorporating key scientific advances and instrumental changes since the publication of the previous white paper [1]. We will discuss perspectives of eXTP on the research domains of flare stars, supernova remnants, pulsar wind nebulae, cataclysmic variables, X-ray binaries, ultraluminous X-ray sources, AGN, and pulsar-based positioning and timekeeping.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Dense Matter in Neutron Stars with eXTP
Authors:
Ang Li,
Anna L. Watts,
Guobao Zhang,
Sebastien Guillot,
Yanjun Xu,
Andrea Santangelo,
Silvia Zane,
Hua Feng,
Shuang-Nan Zhang,
Mingyu Ge,
Liqiang Qi,
Tuomo Salmi,
Bas Dorsman,
Zhiqiang Miao,
Zhonghao Tu,
Yuri Cavecchi,
Xia Zhou,
Xiaoping Zheng,
Weihua Wang,
Quan Cheng,
Xuezhi Liu,
Yining Wei,
Wei Wang,
Yujing Xu,
Shanshan Weng
, et al. (58 additional authors not shown)
Abstract:
In this White Paper, we present the potential of the enhanced X-ray Timing and Polarimetry (eXTP) mission to constrain the equation of state of dense matter in neutron stars, exploring regimes not directly accessible to terrestrial experiments. By observing a diverse population of neutron stars - including isolated objects, X-ray bursters, and accreting systems - eXTP's unique combination of timin…
▽ More
In this White Paper, we present the potential of the enhanced X-ray Timing and Polarimetry (eXTP) mission to constrain the equation of state of dense matter in neutron stars, exploring regimes not directly accessible to terrestrial experiments. By observing a diverse population of neutron stars - including isolated objects, X-ray bursters, and accreting systems - eXTP's unique combination of timing, spectroscopy, and polarimetry enables high-precision measurements of compactness, spin, surface temperature, polarimetric signals, and timing irregularity. These multifaceted observations, combined with advances in theoretical modeling, pave the way toward a comprehensive description of the properties and phases of dense matter from the crust to the core of neutron stars. Under development by an international Consortium led by the Institute of High Energy Physics of the Chinese Academy of Sciences, the eXTP mission is planned to be launched in early 2030.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
The enhanced X-ray Timing and Polarimetry mission -- eXTP for launch in 2030
Authors:
Shuang-Nan Zhang,
Andrea Santangelo,
Yupeng Xu,
Hua Feng,
Fangjun Lu,
Yong Chen,
Mingyu Ge,
Kirpal Nandra,
Xin Wu,
Marco Feroci,
Margarita Hernanz,
Congzhan Liu,
Huilin He,
Yusa Wang,
Weichun Jiang,
Weiwei Cui,
Yanji Yang,
Juan Wang,
Wei Li,
Xiaohua Liu,
Bin Meng,
Xiangyang Wen,
Aimei Zhang,
Jia Ma,
Maoshun Li
, et al. (136 additional authors not shown)
Abstract:
In this paper we present the current status of the enhanced X-ray Timing and Polarimetry mission, which has been fully approved for launch in 2030. eXTP is a space science mission designed to study fundamental physics under extreme conditions of matter density, gravity, and magnetism. The mission aims at determining the equation of state of matter at supra-nuclear density, measuring effects of QED…
▽ More
In this paper we present the current status of the enhanced X-ray Timing and Polarimetry mission, which has been fully approved for launch in 2030. eXTP is a space science mission designed to study fundamental physics under extreme conditions of matter density, gravity, and magnetism. The mission aims at determining the equation of state of matter at supra-nuclear density, measuring effects of QED, and understanding the dynamics of matter in strong-field gravity. In addition to investigating fundamental physics, the eXTP mission is poised to become a leading observatory for time-domain and multi-messenger astronomy in the 2030s, as well as providing observations of unprecedented quality on a variety of galactic and extragalactic objects. After briefly introducing the history and a summary of the scientific objectives of the eXTP mission, this paper presents a comprehensive overview of: 1) the cutting-edge technology, technical specifications, and anticipated performance of the mission's scientific instruments; 2) the full mission profile, encompassing spacecraft design, operational capabilities, and ground segment infrastructure.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.