-
Structural Inhomogeneities and Suppressed Magneto-Structural Coupling in Mn-Substituted GeCo2O4
Authors:
Shivani Sharma,
Pooja Jain,
Benny Schundelmier,
Chin-Wei Wang,
Poonam Yadav,
Adrienn Maria Szucs,
Kaya Wei,
N. P. Lalla,
Theo Siegrist
Abstract:
A comprehensive study of Ge1-xMnxCo2O4 (GMCO) system was conducted using neutron powder diffraction (NPD), x-ray diffraction (XRD), Scanning electron microscopy, magnetometry, and heat capacity measurements. Comparative analysis with GeCo2O4 (GCO) highlights the influence of Mn substitution on the crystal and magnetic structure at low temperature. Surprisingly, phase separation is observed in GMCO…
▽ More
A comprehensive study of Ge1-xMnxCo2O4 (GMCO) system was conducted using neutron powder diffraction (NPD), x-ray diffraction (XRD), Scanning electron microscopy, magnetometry, and heat capacity measurements. Comparative analysis with GeCo2O4 (GCO) highlights the influence of Mn substitution on the crystal and magnetic structure at low temperature. Surprisingly, phase separation is observed in GMCO with a targeted nominal composition of Ge0.5Mn0.5Co2O4. SEM/EDX analysis reveals that the sample predominantly consists of a Mn-rich primary phase with approximate stoichiometry Mn0.74Ge0.18Co2O4, along with a minor Ge-rich secondary phase of composition Ge0.91Mn0.19Co2O4. Although both GCO and GMCO crystallize in cubic symmetry at room temperature, a substantial difference in low-temperature structural properties has been observed. Magnetic and heat capacity data indicate ferrimagnetic ordering in the Mn-rich phase near TC = 108 K, while the Ge-rich phase exhibits antiferromagnetic order at TN = 22 K in GMCO. Analysis of heat capacity data reveals that the estimated magnetic entropy amounts to only 63% of the theoretical value expected in GMCO. A collinear ferrimagnetic arrangement is observed in the Mn rich phase below the magnetic ordering temperature, characterized by antiparallel spins of the Mn at A site and Co at B site along the c-direction. At 5 K, the refined magnetic moments are 2.31(3) for MnA and 1.82(3) uB for CoB in the Mn rich ferrimagnetic phase. The magnetic structure at 5 K in the Ge rich secondary phase is identical to the antiferromagnetic structure of the parent compound GeCo2O4. The refined value of the CoB moment in this phase at 5 K is 2.53(3) uB.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations
Authors:
Kevin L. Wei,
Patricia Paskov,
Sunishchal Dev,
Michael J. Byun,
Anka Reuel,
Xavier Roberts-Gaal,
Rachel Calcott,
Evie Coxon,
Chinmay Deshpande
Abstract:
In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluatio…
▽ More
In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluations. Models are often claimed to achieve "super-human" performance, but existing baselining methods are neither sufficiently rigorous nor sufficiently well-documented to robustly measure and assess performance differences. Based on a meta-review of the measurement theory and AI evaluation literatures, we derive a framework with recommendations for designing, executing, and reporting human baselines. We synthesize our recommendations into a checklist that we use to systematically review 115 human baselines (studies) in foundation model evaluations and thus identify shortcomings in existing baselining methods; our checklist can also assist researchers in conducting human baselines and reporting results. We hope our work can advance more rigorous AI evaluation practices that can better serve both the research community and policymakers. Data is available at: https://github.com/kevinlwei/human-baselines
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
Authors:
Songze Li,
Mingxuan Zhang,
Kang Wei,
Shouling Ji
Abstract:
Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actions when specific trigger patterns are present in t…
▽ More
Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actions when specific trigger patterns are present in the state observations. However, most existing backdoor attacks rely primarily on simplistic and heuristic trigger configurations, overlooking the potential efficacy of trigger optimization. To address this gap, we introduce TooBadRL (Trigger Optimization to Boost Effectiveness of Backdoor Attacks on DRL), the first framework to systematically optimize DRL backdoor triggers along three critical axes, i.e., temporal, spatial, and magnitude. Specifically, we first introduce a performance-aware adaptive freezing mechanism for injection timing. Then, we formulate dimension selection as a cooperative game, utilizing Shapley value analysis to identify the most influential state variable for the injection dimension. Furthermore, we propose a gradient-based adversarial procedure to optimize the injection magnitude under environment constraints. Evaluations on three mainstream DRL algorithms and nine benchmark tasks show that TooBadRL significantly improves attack success rates, while ensuring minimal degradation of normal task performance. These results highlight the previously underappreciated importance of principled trigger optimization in DRL backdoor attacks. The source code of TooBadRL can be found at https://github.com/S3IC-Lab/TooBadRL.
△ Less
Submitted 12 June, 2025; v1 submitted 11 June, 2025;
originally announced June 2025.
-
Collaborative On-Sensor Array Cameras
Authors:
Jipeng Sun,
Kaixuan Wei,
Thomas Eboli,
Congli Wang,
Cheng Zheng,
Zhihao Zhou,
Arka Majumdar,
Wolfgang Heidrich,
Felix Heide
Abstract:
Modern nanofabrication techniques have enabled us to manipulate the wavefront of light with sub-wavelength-scale structures, offering the potential to replace bulky refractive surfaces in conventional optics with ultrathin metasurfaces. In theory, arrays of nanoposts provide unprecedented control over manipulating the wavefront in terms of phase, polarization, and amplitude at the nanometer resolu…
▽ More
Modern nanofabrication techniques have enabled us to manipulate the wavefront of light with sub-wavelength-scale structures, offering the potential to replace bulky refractive surfaces in conventional optics with ultrathin metasurfaces. In theory, arrays of nanoposts provide unprecedented control over manipulating the wavefront in terms of phase, polarization, and amplitude at the nanometer resolution. A line of recent work successfully investigates flat computational cameras that replace compound lenses with a single metalens or an array of metasurfaces a few millimeters from the sensor. However, due to the inherent wavelength dependence of metalenses, in practice, these cameras do not match their refractive counterparts in image quality for broadband imaging, and may even suffer from hallucinations when relying on generative reconstruction methods.
In this work, we investigate a collaborative array of metasurface elements that are jointly learned to perform broadband imaging. To this end, we learn a nanophotonics array with 100-million nanoposts that is end-to-end jointly optimized over the full visible spectrum--a design task that existing inverse design methods or learning approaches cannot support due to memory and compute limitations. We introduce a distributed meta-optics learning method to tackle this challenge. This allows us to optimize a large parameter array along with a learned meta-atom proxy and a non-generative reconstruction method that is parallax-aware and noise-aware. The proposed camera performs favorably in simulation and in all experimental tests irrespective of the scene illumination spectrum.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Bridging the Artificial Intelligence Governance Gap: The United States' and China's Divergent Approaches to Governing General-Purpose Artificial Intelligence
Authors:
Oliver Guest,
Kevin Wei
Abstract:
The United States and China are among the world's top players in the development of advanced artificial intelligence (AI) systems, and both are keen to lead in global AI governance and development. A look at U.S. and Chinese policy landscapes reveals differences in how the two countries approach the governance of general-purpose artificial intelligence (GPAI) systems. Three areas of divergence are…
▽ More
The United States and China are among the world's top players in the development of advanced artificial intelligence (AI) systems, and both are keen to lead in global AI governance and development. A look at U.S. and Chinese policy landscapes reveals differences in how the two countries approach the governance of general-purpose artificial intelligence (GPAI) systems. Three areas of divergence are notable for policymakers: the focus of domestic AI regulation, key principles of domestic AI regulation, and approaches to implementing international AI governance. As AI development continues, global conversation around AI has warned of global safety and security challenges posed by GPAI systems. Cooperation between the United States and China might be needed to address these risks, and understanding the implications of these differences might help address the broader challenges for international cooperation between the United States and China on AI safety and security.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Large-Area Fabrication-aware Computational Diffractive Optics
Authors:
Kaixuan Wei,
Hector A. Jimenez-Romero,
Hadi Amata,
Jipeng Sun,
Qiang Fu,
Felix Heide,
Wolfgang Heidrich
Abstract:
Differentiable optics, as an emerging paradigm that jointly optimizes optics and (optional) image processing algorithms, has made innovative optical designs possible across a broad range of applications. Many of these systems utilize diffractive optical components (DOEs) for holography, PSF engineering, or wavefront shaping. Existing approaches have, however, mostly remained limited to laboratory…
▽ More
Differentiable optics, as an emerging paradigm that jointly optimizes optics and (optional) image processing algorithms, has made innovative optical designs possible across a broad range of applications. Many of these systems utilize diffractive optical components (DOEs) for holography, PSF engineering, or wavefront shaping. Existing approaches have, however, mostly remained limited to laboratory prototypes, owing to a large quality gap between simulation and manufactured devices. We aim at lifting the fundamental technical barriers to the practical use of learned diffractive optical systems. To this end, we propose a fabrication-aware design pipeline for diffractive optics fabricated by direct-write grayscale lithography followed by nano-imprinting replication, which is directly suited for inexpensive mass production of large area designs. We propose a super-resolved neural lithography model that can accurately predict the 3D geometry generated by the fabrication process. This model can be seamlessly integrated into existing differentiable optics frameworks, enabling fabrication-aware, end-to-end optimization of computational optical systems. To tackle the computational challenges, we also devise tensor-parallel compute framework centered on distributing large-scale FFT computation across many GPUs. As such, we demonstrate large scale diffractive optics designs up to 32.16 mm $\times$ 21.44 mm, simulated on grids of up to 128,640 by 85,760 feature points. We find adequate agreement between simulation and fabricated prototypes for applications such as holography and PSF engineering. We also achieve high image quality from an imaging system comprised only of a single DOE, with images processed only by a Wiener filter utilizing the simulation PSF. We believe our findings lift the fabrication limitations for real-world applications of diffractive optics and differentiable optical design.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Strong Molecule-Light Entanglement with Molecular Cavity Optomechanics
Authors:
Hong-Yun Yu,
Ya-Feng Jiao,
Jie Wang,
Feng Li,
Bin Yin,
Tian Jiang,
Qi-Rui Liu,
Hui Jing,
Ke Wei
Abstract:
We propose a molecular optomechanical platform to generate robust entanglement among bosonic modes-photons, phonons, and plasmons-under ambient conditions. The system integrates an ultrahigh-Q whispering-gallery-mode (WGM) optical resonator with a plasmonic nanocavity formed by a metallic nanoparticle and a single molecule. This hybrid architecture offers two critical advantages over standalone pl…
▽ More
We propose a molecular optomechanical platform to generate robust entanglement among bosonic modes-photons, phonons, and plasmons-under ambient conditions. The system integrates an ultrahigh-Q whispering-gallery-mode (WGM) optical resonator with a plasmonic nanocavity formed by a metallic nanoparticle and a single molecule. This hybrid architecture offers two critical advantages over standalone plasmonic systems: (i) Efficient redirection of Stokes photons from the lossy plasmonic mode into the long-lived WGM resonator, and (ii) Suppression of molecular absorption and approaching vibrational ground states via plasmon-WGM interactions. These features enable entanglement to transfer from the fragile plasmon-phonon subsystem to a photon-phonon bipartition in the blue-detuned regime, yielding robust stationary entanglement resilient to environmental noise. Remarkably, the achieved entanglement surpasses the theoretical bound for conventional two-mode squeezing in certain parameter regimes. Our scheme establishes a universal approach to safeguard entanglement in open quantum systems and opens avenues for noise-resilient quantum information technologies.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Authors:
Yaoning Yu,
Ye Yu,
Kai Wei,
Haojing Luo,
Haohan Wang
Abstract:
Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop frame…
▽ More
Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and reasoning benchmarks show that SIPDO outperforms standard prompt tuning methods, highlighting the value of integrating data synthesis into prompt learning workflows.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Dynamically Polarized SERF Atomic Comagnetometer
Authors:
Xiaofei Huang,
Kai Wei,
Yang Rui,
Dinghui Gong,
Saixin Zhou,
Jie Zheng,
Wei Quan
Abstract:
Atomic spin sensors are essential for beyond-the-standard-model exploration, biomagnetic measurement, and quantum navigation. While the traditional DC mode spin-exchange relaxation-free (SERF) comagnetometer achieves ultrahigh sensitivity, further improvements require suppressing technical noise and surpassing standard quantum limit. In this work, we develop a K-Rb-$^{21}$Ne SERF atomic comagnetom…
▽ More
Atomic spin sensors are essential for beyond-the-standard-model exploration, biomagnetic measurement, and quantum navigation. While the traditional DC mode spin-exchange relaxation-free (SERF) comagnetometer achieves ultrahigh sensitivity, further improvements require suppressing technical noise and surpassing standard quantum limit. In this work, we develop a K-Rb-$^{21}$Ne SERF atomic comagnetometer that dynamically polarizes the electron and nuclear spins, shielding signals from direct interference by pump light. We establish a three-phase evolutionary model for hybrid spin ensemble dynamics, yielding a complete analytical solution, and analyze the responses to various spin perturbations. Additionally, we achieve an averaged 38.5 $\%$ suppression of the polarization noise and identify the key factors that limit sensitivity improvements. The dynamically polarized comagnetometer exhibits effective suppression of technical noise and holds the potential to overcome quantum noise limit, while offering promising applications in exploring new physics and precise magnetic field measurements.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
Authors:
Kangda Wei,
Hasnat Md Abdullah,
Ruihong Huang
Abstract:
Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios,…
▽ More
Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios, then elicits and compares their moral judgments. When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments. These story-judgment pairs are used to fine-tune or optimize the models via Direct Preference Optimization (DPO). Experimental results show that our method significantly reduces gender bias while preserving or even enhancing general model capabilities. We will release the code and generated data.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports
Authors:
Kevin Wu,
Eric Wu,
Rahul Thapa,
Kevin Wei,
Angela Zhang,
Arvind Suresh,
Jacqueline J. Tao,
Min Woo Sun,
Alejandro Lozano,
James Zou
Abstract:
Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final ans…
▽ More
Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final answer, overlooking the quality and faithfulness of the clinical reasoning process. To address this limitation, we introduce MedCaseReasoning, the first open-access dataset for evaluating LLMs on their ability to align with clinician-authored diagnostic reasoning. The dataset includes 14,489 diagnostic question-and-answer cases, each paired with detailed reasoning statements derived from open-access medical case reports. We evaluate state-of-the-art reasoning LLMs on MedCaseReasoning and find significant shortcomings in their diagnoses and reasoning: for instance, the top-performing open-source model, DeepSeek-R1, achieves only 48% 10-shot diagnostic accuracy and mentions only 64% of the clinician reasoning statements (recall). However, we demonstrate that fine-tuning LLMs on the reasoning traces derived from MedCaseReasoning significantly improves diagnostic accuracy and clinical reasoning recall by an average relative gain of 29% and 41%, respectively. The open-source dataset, code, and models are available at https://github.com/kevinwu23/Stanford-MedCaseReasoning.
△ Less
Submitted 20 May, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
Third-party compliance reviews for frontier AI safety frameworks
Authors:
Aidan Homewood,
Sophie Williams,
Noemi Dreksler,
John Lidiard,
Malcolm Murray,
Lennart Heim,
Marta Ziosi,
Seán Ó hÉigeartaigh,
Michael Chen,
Kevin Wei,
Christoph Winter,
Miles Brundage,
Ben Garfinkel,
Jonas Schuett
Abstract:
Safety frameworks have emerged as a best practice for managing risks from frontier artificial intelligence (AI) systems. However, it may be difficult for stakeholders to know if companies are adhering to their frameworks. This paper explores a potential solution: third-party compliance reviews. During a third-party compliance review, an independent external party assesses whether a frontier AI com…
▽ More
Safety frameworks have emerged as a best practice for managing risks from frontier artificial intelligence (AI) systems. However, it may be difficult for stakeholders to know if companies are adhering to their frameworks. This paper explores a potential solution: third-party compliance reviews. During a third-party compliance review, an independent external party assesses whether a frontier AI company is complying with its safety framework. First, we discuss the main benefits and challenges of such reviews. On the one hand, they can increase compliance with safety frameworks and provide assurance to internal and external stakeholders. On the other hand, they can create information security risks, impose additional cost burdens, and cause reputational damage, but these challenges can be partially mitigated by drawing on best practices from other industries. Next, we answer practical questions about third-party compliance reviews, namely: (1) Who could conduct the review? (2) What information sources could the reviewer consider? (3) How could compliance with the safety framework be assessed? (4) What information about the review could be disclosed externally? (5) How could the findings guide development and deployment actions? (6) When could the reviews be conducted? For each question, we evaluate a set of plausible options. Finally, we suggest "minimalist", "more ambitious", and "comprehensive" approaches for each question that a frontier AI company could adopt.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Search for a parity-violating long-range spin-dependent interaction
Authors:
Xing Heng,
Zitong Xu,
Xiaofei Huang,
Dinghui Gong,
Guoqing Tian,
Wei Ji,
Jiancheng Fang,
Dmitry Budker,
Kai Wei
Abstract:
High-sensitivity quantum sensors are a promising tool for experimental searches for beyond-Standard-Model interactions. Here, we demonstrate an atomic comagnetometer operating under a resonantly-coupled hybrid spin-resonance (HSR) regime to probe P-odd, T-even interactions. The HSR regime enables robust nuclear-electron spin coupling, enhancing measurement bandwidth and stability without compromis…
▽ More
High-sensitivity quantum sensors are a promising tool for experimental searches for beyond-Standard-Model interactions. Here, we demonstrate an atomic comagnetometer operating under a resonantly-coupled hybrid spin-resonance (HSR) regime to probe P-odd, T-even interactions. The HSR regime enables robust nuclear-electron spin coupling, enhancing measurement bandwidth and stability without compromising the high sensitivity of spin-exchange relaxation-free magnetometers. To minimize vibration noise from velocity-modulated sources, we implement a multistage vibration isolation system, achieving a vibration noise reduction exceeding 700-fold. We establish new constraints on vector-boson-mediated parity-violating interactions, improving experimental sensitivity by three orders of magnitude compared to previous limits. The new constraints complement existing astrophysical and laboratory studies of potential extensions to the Standard Model.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Scalable twin-field quantum key distribution network enabled by adaptable architecture
Authors:
Chunfeng Huang,
Rui Guan,
Xin Liu,
Wenjie He,
Shizhuo Li,
Hao Liang,
Ziyang Luo,
Zhenrong Zhang,
Wei Li,
Kejin Wei
Abstract:
Quantum key distribution (QKD) is a key application in quantum communication, enabling secure key exchange between parties using quantum states. Twin-field (TF) QKD offers a promising solution that surpasses the repeaterless limits, and its measurement-device-independent nature makes it suitable for star-type network architectures. In this work, we propose a scalable TF-QKD network with adaptable…
▽ More
Quantum key distribution (QKD) is a key application in quantum communication, enabling secure key exchange between parties using quantum states. Twin-field (TF) QKD offers a promising solution that surpasses the repeaterless limits, and its measurement-device-independent nature makes it suitable for star-type network architectures. In this work, we propose a scalable TF-QKD network with adaptable architecture, where users prepare quantum signals and send them to network nodes. These nodes use an optical switch to route the signals to multi-user measurement units, enabling secure key distribution among arbitrary users and adapting to complex connection demands of the network. A proof-of-principle demonstration with three users successfully achieved secure key sharing over simulated link losses of up to $30$ dB, with an average rate of $19.57$ bit/s. Additionally, simulations show that the proposed architecture can achieve a total secure key rate of $4.84 \times 10^{4}$ bit/s at $100$ km in a symmetric $32$-user network. This approach represents a significant advancement in the topology of untrusted-node QKD networks and holds promise for practical, large-scale applications in secure communication.
△ Less
Submitted 27 May, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction
Authors:
Mengying Yuan,
Wenhao Wang,
Zixuan Wang,
Yujie Huang,
Kangli Wei,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Natural Language Inference (NLI) is a fundamental task in natural language processing. While NLI has developed many sub-directions such as sentence-level NLI, document-level NLI and cross-lingual NLI, Cross-Document Cross-Lingual NLI (CDCL-NLI) remains largely unexplored. In this paper, we propose a novel paradigm: CDCL-NLI, which extends traditional NLI capabilities to multi-document, multilingua…
▽ More
Natural Language Inference (NLI) is a fundamental task in natural language processing. While NLI has developed many sub-directions such as sentence-level NLI, document-level NLI and cross-lingual NLI, Cross-Document Cross-Lingual NLI (CDCL-NLI) remains largely unexplored. In this paper, we propose a novel paradigm: CDCL-NLI, which extends traditional NLI capabilities to multi-document, multilingual scenarios. To support this task, we construct a high-quality CDCL-NLI dataset including 25,410 instances and spanning 26 languages. To address the limitations of previous methods on CDCL-NLI task, we further propose an innovative method that integrates RST-enhanced graph fusion with interpretability-aware prediction. Our approach leverages RST (Rhetorical Structure Theory) within heterogeneous graph neural networks for cross-document context modeling, and employs a structure-aware semantic alignment based on lexical chains for cross-lingual understanding. For NLI interpretability, we develop an EDU (Elementary Discourse Unit)-level attribution framework that produces extractive explanations. Extensive experiments demonstrate our approach's superior performance, achieving significant improvements over both conventional NLI models as well as large language models. Our work sheds light on the study of NLI and will bring research interest on cross-document cross-lingual context understanding, hallucination elimination and interpretability inference. Our code and datasets are available at \href{https://anonymous.4open.science/r/CDCL-NLI-637E/}{CDCL-NLI-link} for peer review.
△ Less
Submitted 20 May, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
Authors:
Zhijie Duan,
Kai Wei,
Zhaoqian Xue,
Jiayan Zhou,
Shu Yang,
Siyuan Ma,
Jin Jin,
Lingyao li
Abstract:
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG)…
▽ More
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide's side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.
△ Less
Submitted 7 April, 2025; v1 submitted 5 April, 2025;
originally announced April 2025.
-
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
Authors:
Abhilekh Borah,
Hasnat Md Abdullah,
Kangda Wei,
Ruihong Huang
Abstract:
The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimoda…
▽ More
The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimodal Evaluation), a first-of-its-kind multimodal dataset, comprising 2579 Twitter and Reddit posts. The benchmark features a diverse collection of humorous memes and skeptical posts, capturing how these formats distill complex issues into viral narratives that shape public opinion and policy discussions. To systematically evaluate LLM performance, we present the Climate Alignment Quotient (CAQ), a novel metric comprising five distinct dimensions: Articulation, Evidence, Resonance, Transition, and Specificity. Additionally, we propose three analytical lenses: Actionability, Criticality, and Justice, to guide the assessment of LLM-generated climate discourse using CAQ. Our findings, based on the CAQ metric, indicate that while most evaluated LLMs perform relatively well in Criticality and Justice, they consistently underperform on the Actionability axis. Among the models evaluated, Claude 3.7 Sonnet achieves the highest overall performance. We publicly release our CliME dataset and code to foster further research in this domain.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Quantum-Secured DSP-Lite Data Transmission Architectures for AI-Driven Data Centres
Authors:
Xitao Ji,
Wenjie He,
Junda Chen,
Mingming Zhang,
Yuqi Li,
Ziwen Zhou,
Zhuoxuan Song,
Hao Wu,
Siqi Yan,
Kejin Wei,
Zhenrong Zhang,
Shuang Wang,
Ming Tang
Abstract:
Artificial intelligence-driven (AI-driven) data centres, which require high-performance, scalable, energy-efficient, and secure infrastructure, have led to unprecedented data traffic demands. These demands involve low latency, high bandwidth connections, low power consumption, and data confidentiality. However, conventional optical interconnect solutions, such as intensity-modulated direct detecti…
▽ More
Artificial intelligence-driven (AI-driven) data centres, which require high-performance, scalable, energy-efficient, and secure infrastructure, have led to unprecedented data traffic demands. These demands involve low latency, high bandwidth connections, low power consumption, and data confidentiality. However, conventional optical interconnect solutions, such as intensity-modulated direct detection and traditional coherent systems, cannot address these requirements simultaneously. In particular, conventional encryption protocols that rely on complex algorithms are increasingly vulnerable to the rapid advancement of quantum computing. Here, we propose and demonstrate a quantum-secured digital signal processing-lite (DSP-Lite) data transmission architecture that meets all the stringent requirements for AI-driven data centre optical interconnects (AI-DCIs) scenarios. By integrating a self-homodyne coherent (SHC) system and quantum key distribution (QKD) through the multicore-fibre-based space division multiplexing (SDM) technology, our scheme enables secure, high-capacity, and energy-efficient data transmission while ensuring resilience against quantum computing threats. In our demonstration, we achieved an expandable transmission capacity of 2 Tbit per second (Tb/s) and a quantum secret key rate (SKR) of 229.2 kb/s, with a quantum bit error rate (QBER) of approximately 1.27% and with ultralow power consumption. Our work paves the way for constructing secure, scalable, and cost-efficient data transmission frameworks, thus enabling the next generation of intelligent, leak-proof optical interconnects for data centres.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
SCOPE-DTI: Semi-Inductive Dataset Construction and Framework Optimization for Practical Usability Enhancement in Deep Learning-Based Drug Target Interaction Prediction
Authors:
Yigang Chen,
Xiang Ji,
Ziyue Zhang,
Yuming Zhou,
Yang-Chi-Dung Lin,
Hsi-Yuan Huang,
Tao Zhang,
Yi Lai,
Ke Chen,
Chang Su,
Xingqiao Lin,
Zihao Zhu,
Yanggyi Zhang,
Kangping Wei,
Jiehui Fu,
Yixian Huang,
Shidong Cui,
Shih-Chung Yen,
Ariel Warshel,
Hsien-Da Huang
Abstract:
Deep learning-based drug-target interaction (DTI) prediction methods have demonstrated strong performance; however, real-world applicability remains constrained by limited data diversity and modeling complexity. To address these challenges, we propose SCOPE-DTI, a unified framework combining a large-scale, balanced semi-inductive human DTI dataset with advanced deep learning modeling. Constructed…
▽ More
Deep learning-based drug-target interaction (DTI) prediction methods have demonstrated strong performance; however, real-world applicability remains constrained by limited data diversity and modeling complexity. To address these challenges, we propose SCOPE-DTI, a unified framework combining a large-scale, balanced semi-inductive human DTI dataset with advanced deep learning modeling. Constructed from 13 public repositories, the SCOPE dataset expands data volume by up to 100-fold compared to common benchmarks such as the Human dataset. The SCOPE model integrates three-dimensional protein and compound representations, graph neural networks, and bilinear attention mechanisms to effectively capture cross domain interaction patterns, significantly outperforming state-of-the-art methods across various DTI prediction tasks. Additionally, SCOPE-DTI provides a user-friendly interface and database. We further validate its effectiveness by experimentally identifying anticancer targets of Ginsenoside Rh1. By offering comprehensive data, advanced modeling, and accessible tools, SCOPE-DTI accelerates drug discovery research.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
A novel layered reconstruction framework for longitudinal segmented electromagnetic calorimeter
Authors:
J. Fei,
A. Yuan,
K. Wei,
L. Sun,
J. Wang
Abstract:
In future high-energy physics experiments, the electromagnetic calorimeter (ECAL) will operate in exceptionally high-luminosity. An ECAL featuring layered readout in the longitudinal direction and precise time-stamped information offers a multi-dimensional view, enriching our comprehension of the showering process of electromagnetic particles in high-luminosity environments. And it is taken as the…
▽ More
In future high-energy physics experiments, the electromagnetic calorimeter (ECAL) will operate in exceptionally high-luminosity. An ECAL featuring layered readout in the longitudinal direction and precise time-stamped information offers a multi-dimensional view, enriching our comprehension of the showering process of electromagnetic particles in high-luminosity environments. And it is taken as the baseline design for several new experiments, including the planned upgrades of the current running experiments. Reconstructing and matching the multi-dimensional information across different layers poses new challenges in utilizing layered data effectively. This work introduces a novel layered reconstruction framework for the ECAL with a layered readout information structure and develops the layered clustering algorithm. It expands the concept of clusters from planes to multiple layers. Additionally, this work presents the corresponding layered cluster correction methods, investigates the transverse shower profile, which is utilized for overlapping clusters splitting, and develops the layered merged $π^0$ reconstruction algorithm based on this framework. By incorporating energy and time information in 3-dimension, this framework provides a suitable software platform for the preliminary research of longitudinal segmented ECAL and new perspectives in physics analysis.
△ Less
Submitted 7 May, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
Authors:
Kangda Wei,
Zhengyu Zhou,
Bingqing Wang,
Jun Araki,
Lukas Lange,
Ruihong Huang,
Zhe Feng
Abstract:
In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework tha…
▽ More
In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework that leverages various large models for advanced understanding/indexing of presentation-style videos. PreMind first segments videos into slide-presentation segments using a Vision-Language Model (VLM) to enhance modern shot-detection techniques. Each segment is then analyzed to generate multimodal indexes through three key steps: (1) extracting slide visual content, (2) transcribing speech narratives, and (3) consolidating these visual and speech contents into an integrated understanding. Three innovative mechanisms are also proposed to improve performance: leveraging prior lecture knowledge to refine visual understanding, detecting/correcting speech transcription errors using a VLM, and utilizing a critic agent for dynamic iterative self-reflection in vision analysis. Compared to traditional video indexing methods, PreMind captures rich, reliable multimodal information, allowing users to search for details like abbreviations shown only on slides. Systematic evaluations on the public LPM dataset and an internal enterprise dataset are conducted to validate PreMind's effectiveness, supported by detailed analyses.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
Will the Technological Singularity Come Soon? Modeling the Dynamics of Artificial Intelligence Development via Multi-Logistic Growth Process
Authors:
Guangyin Jin,
Xiaohan Ni,
Kun Wei,
Jie Zhao,
Haoming Zhang,
Leiming Jia
Abstract:
We are currently in an era of escalating technological complexity and profound societal transformations, where artificial intelligence (AI) technologies exemplified by large language models (LLMs) have reignited discussions on the 'Technological Singularity'. 'Technological Singularity' is a philosophical concept referring to an irreversible and profound transformation that occurs when AI capabili…
▽ More
We are currently in an era of escalating technological complexity and profound societal transformations, where artificial intelligence (AI) technologies exemplified by large language models (LLMs) have reignited discussions on the 'Technological Singularity'. 'Technological Singularity' is a philosophical concept referring to an irreversible and profound transformation that occurs when AI capabilities surpass those of humans comprehensively. However, quantitative modeling and analysis of the historical evolution and future trends of AI technologies remain scarce, failing to substantiate the singularity hypothesis adequately. This paper hypothesizes that the development of AI technologies could be characterized by the superposition of multiple logistic growth processes. To explore this hypothesis, we propose a multi-logistic growth process model and validate it using two real-world datasets: AI Historical Statistics and Arxiv AI Papers. Our analysis of the AI Historical Statistics dataset assesses the effectiveness of the multi-logistic model and evaluates the current and future trends in AI technology development. Additionally, cross-validation experiments on the Arxiv AI Paper, GPU Transistor and Internet User dataset enhance the robustness of our conclusions derived from the AI Historical Statistics dataset. The experimental results reveal that around 2024 marks the fastest point of the current AI wave, and the deep learning-based AI technologies are projected to decline around 2035-2040 if no fundamental technological innovation emerges. Consequently, the technological singularity appears unlikely to arrive in the foreseeable future.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
FLEKE: Federated Locate-then-Edit Knowledge Editing
Authors:
Zongkai Zhao,
Guozeng Xu,
Xiuhua Li,
Kaiwen Wei,
Jiang Zhong
Abstract:
Locate-then-Edit Knowledge Editing (LEKE) is a key technique for updating large language models (LLMs) without full retraining. However, existing methods assume a single-user setting and become inefficient in real-world multi-client scenarios, where decentralized organizations (e.g., hospitals, financial institutions) independently update overlapping knowledge, leading to redundant mediator knowle…
▽ More
Locate-then-Edit Knowledge Editing (LEKE) is a key technique for updating large language models (LLMs) without full retraining. However, existing methods assume a single-user setting and become inefficient in real-world multi-client scenarios, where decentralized organizations (e.g., hospitals, financial institutions) independently update overlapping knowledge, leading to redundant mediator knowledge vector (MKV) computations and privacy concerns. To address these challenges, we introduce Federated Locate-then-Edit Knowledge Editing (FLEKE), a novel task that enables multiple clients to collaboratively perform LEKE while preserving privacy and reducing computational overhead. To achieve this, we propose FedEdit, a two-stage framework that optimizes MKV selection and reuse. In the first stage, clients locally apply LEKE and upload the computed MKVs. In the second stage, rather than relying solely on server-based MKV sharing, FLEKE allows clients retrieve relevant MKVs based on cosine similarity, enabling knowledge re-edit and minimizing redundant computations. Experimental results on two benchmark datasets demonstrate that FedEdit retains over 96% of the performance of non-federated LEKE while significantly outperforming a FedAvg-based baseline by approximately twofold. Besides, we find that MEMIT performs more consistently than PMET in the FLEKE task with our FedEdit framework. Our code is available at https://github.com/zongkaiz/FLEKE.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework
Authors:
Yuming Yang,
Jiang Zhong,
Li Jin,
Jingwang Huang,
Jingpeng Gao,
Qing Liu,
Yang Bai,
Jingyuan Zhang,
Rui Jiang,
Kaiwen Wei
Abstract:
Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically g…
▽ More
Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art MLLMs achieve only 58.19% Correctness and 73.87% Coverage scores, and (3) MLLMs demonstrate consistent text-over-visual modality bias during Chart-based MRAG reasoning. The CHARGE and Chart-MRAG Bench are released at https://github.com/Nomothings/CHARGE.git.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition
Authors:
Jingwang Huang,
Jiang Zhong,
Qin Lei,
Jinpeng Gao,
Yuming Yang,
Sirui Wang,
Peiguang Li,
Kaiwen Wei
Abstract:
Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of \textbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of…
▽ More
Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of \textbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of modality fusion by introducing ambiguity into feature representations. To address this issue and effectively model aleatoric uncertainty, this paper proposes Latent emotional Distribution Decomposition with Uncertainty perception (LDDU) framework from a novel perspective of latent emotional space probabilistic modeling. Specifically, we introduce a contrastive disentangled distribution mechanism within the emotion space to model the multimodal data, allowing for the extraction of semantic features and uncertainty. Furthermore, we design an uncertainty-aware fusion multimodal method that accounts for the dispersed distribution of uncertainty and integrates distribution information. Experimental results show that LDDU achieves state-of-the-art performance on the CMU-MOSEI and M$^3$ED datasets, highlighting the importance of uncertainty modeling in MMER. Code is available at https://github.com/201983290498/lddu\_mmer.git.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
LegalCore: A Dataset for Event Coreference Resolution in Legal Documents
Authors:
Kangda Wei,
Xi Shi,
Jonathan Tong,
Sai Ramana Reddy,
Anandhavelu Natarajan,
Rajiv Jain,
Aparna Garimella,
Ruihong Huang
Abstract:
Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract docum…
▽ More
Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract documents we annotated in this dataset are several times longer than news articles, with an average length of around 25k tokens per document. The annotations show that legal documents have dense event mentions and feature both short-distance and super long-distance coreference links between event mentions. We further benchmark mainstream Large Language Models (LLMs) on this dataset for both event detection and event coreference resolution tasks, and find that this dataset poses significant challenges for state-of-the-art open-source and proprietary LLMs, which perform significantly worse than a supervised baseline. We will publish the dataset as well as the code.
△ Less
Submitted 20 March, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Toward Equitable Access: Leveraging Crowdsourced Reviews to Investigate Public Perceptions of Health Resource Accessibility
Authors:
Zhaoqian Xue,
Guanhong Liu,
Kai Wei,
Chong Zhang,
Qingcheng Zeng,
Songhua Hu,
Wenyue Hua,
Lizhou Fan,
Yongfeng Zhang,
Lingyao Li
Abstract:
Access to health resources is a critical determinant of public well-being and societal resilience, particularly during public health crises when demand for medical services and preventive care surges. However, disparities in accessibility persist across demographic and geographic groups, raising concerns about equity. Traditional survey methods often fall short due to limitations in coverage, cost…
▽ More
Access to health resources is a critical determinant of public well-being and societal resilience, particularly during public health crises when demand for medical services and preventive care surges. However, disparities in accessibility persist across demographic and geographic groups, raising concerns about equity. Traditional survey methods often fall short due to limitations in coverage, cost, and timeliness. This study leverages crowdsourced data from Google Maps reviews, applying advanced natural language processing techniques, specifically ModernBERT, to extract insights on public perceptions of health resource accessibility in the United States during the COVID-19 pandemic. Additionally, we employ Partial Least Squares regression to examine the relationship between accessibility perceptions and key socioeconomic and demographic factors including political affiliation, racial composition, and educational attainment. Our findings reveal that public perceptions of health resource accessibility varied significantly across the U.S., with disparities peaking during the pandemic and slightly easing post-crisis. Political affiliation, racial demographics, and education levels emerged as key factors shaping these perceptions. These findings underscore the need for targeted interventions and policy measures to address inequities, fostering a more inclusive healthcare infrastructure that can better withstand future public health challenges.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Molecular optomechanically-induced transparency
Authors:
Bin Yin,
Jie Wang,
Mei-Yu Peng,
Qian Zhang,
Deng Wang,
Tian-Xiang Lu,
Ke Wei,
Hui Jing
Abstract:
Molecular cavity optomechanics (COM), characterized by remarkably efficient optomechanical coupling enabled by a highly localized light field and ultra-small effective mode volume, holds significant promise for advancing applications in quantum science and technology. Here, we study optomechanically induced transparency and the associated group delay in a hybrid molecular COM system. We find that…
▽ More
Molecular cavity optomechanics (COM), characterized by remarkably efficient optomechanical coupling enabled by a highly localized light field and ultra-small effective mode volume, holds significant promise for advancing applications in quantum science and technology. Here, we study optomechanically induced transparency and the associated group delay in a hybrid molecular COM system. We find that even with an extremely low optical quality factor, an obvious transparency window can appear, which is otherwise unattainable in a conventional COM system. Furthermore, by varying the ports of the probe light, the optomechanically induced transparency or absorption can be achieved, along with corresponding slowing or advancing of optical signals. These results indicate that our scheme provides a new method for adjusting the storage and retrieval of optical signals in such a molecular COM device.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Improved Quantum Computation using Operator Backpropagation
Authors:
Bryce Fuller,
Minh C. Tran,
Danylo Lykov,
Caleb Johnson,
Max Rossmannek,
Ken Xuan Wei,
Andre He,
Youngseok Kim,
DinhDuy Vu,
Kunal Sharma,
Yuri Alexeev,
Abhinav Kandala,
Antonio Mezzacapo
Abstract:
Decoherence of quantum hardware is currently limiting its practical applications. At the same time, classical algorithms for simulating quantum circuits have progressed substantially. Here, we demonstrate a hybrid framework that integrates classical simulations with quantum hardware to improve the computation of an observable's expectation value by reducing the quantum circuit depth. In this frame…
▽ More
Decoherence of quantum hardware is currently limiting its practical applications. At the same time, classical algorithms for simulating quantum circuits have progressed substantially. Here, we demonstrate a hybrid framework that integrates classical simulations with quantum hardware to improve the computation of an observable's expectation value by reducing the quantum circuit depth. In this framework, a quantum circuit is partitioned into two subcircuits: one that describes the backpropagated Heisenberg evolution of an observable, executed on a classical computer, while the other is a Schrödinger evolution run on quantum processors. The overall effect is to reduce the depths of the circuits executed on quantum devices, trading this with classical overhead and an increased number of circuit executions. We demonstrate the effectiveness of this method on a Hamiltonian simulation problem, achieving more accurate expectation value estimates compared to using quantum hardware alone.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
The AI Agent Index
Authors:
Stephen Casper,
Luke Bailey,
Rosco Hunter,
Carson Ezell,
Emma Cabalé,
Michael Gerovitch,
Stewart Slocum,
Kevin Wei,
Nikola Jurkovic,
Ariba Khan,
Phillip J. K. Christoffersen,
A. Pinar Ozisik,
Rakshit Trivedi,
Dylan Hadfield-Menell,
Noam Kolt
Abstract:
Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document informati…
▽ More
Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document information about currently deployed agentic AI systems. For each system that meets the criteria for inclusion in the index, we document the system's components (e.g., base model, reasoning implementation, tool use), application domains (e.g., computer use, software engineering), and risk management practices (e.g., evaluation results, guardrails), based on publicly available information and correspondence with developers. We find that while developers generally provide ample information regarding the capabilities and applications of agentic systems, they currently provide limited information regarding safety and risk management practices. The AI Agent Index is available online at https://aiagentindex.mit.edu/
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
SimulataR: Rapid Assisted Reality Prototyping using Design-Blended Videos
Authors:
Ashwin Ram,
Yue Gu,
Bowen Wang,
Sneha Jaikumar,
Youqi Wu,
Benjamin Tan Kuan Wei,
Qingyang Xu,
Haiming Liu,
Shengdong Zhao
Abstract:
Assisted Reality (aR) is a subfield of Augmented Reality (AR) that overlays information onto a user's immediate view via see-through head-mounted displays (OST-HMDs). This technology has proven to be effective and energy-efficient to support the user and information interaction for everyday wearable intelligent systems. The aR viewing experience, however, is affected by varying real-world backgrou…
▽ More
Assisted Reality (aR) is a subfield of Augmented Reality (AR) that overlays information onto a user's immediate view via see-through head-mounted displays (OST-HMDs). This technology has proven to be effective and energy-efficient to support the user and information interaction for everyday wearable intelligent systems. The aR viewing experience, however, is affected by varying real-world backgrounds, lighting, and user movements, which makes designing for aR challenging. Designers have to test their designs in-situ across multiple real-world settings, which can be time-consuming and labor-intensive. We propose SimulataR, a cost-effective desktop-based approach for rapid aR prototyping using first-person-view context videos blended with design prototypes to simulate an aR experience. A field study involving 12 AR users comparing SimulataR to real OST-HMDs found that SimulataR can approximate the aR experience, particularly for indoors and in low-to-moderate lit outdoor environments. Case studies with two designers who used SimulataR in their design process demonstrates the potential of design-blended videos for rapid aR prototyping.
△ Less
Submitted 9 February, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition
Authors:
Qijie Shao,
Linhao Dong,
Kun Wei,
Sining Sun,
Lei Xie
Abstract:
Data2vec is a self-supervised learning (SSL) approach that employs a teacher-student architecture for contextual representation learning via masked prediction, demonstrating remarkable performance in monolingual ASR. Previous studies have revealed that data2vec's shallow layers capture speaker and language information, middle layers encode phoneme and word features, while deep layers are responsib…
▽ More
Data2vec is a self-supervised learning (SSL) approach that employs a teacher-student architecture for contextual representation learning via masked prediction, demonstrating remarkable performance in monolingual ASR. Previous studies have revealed that data2vec's shallow layers capture speaker and language information, middle layers encode phoneme and word features, while deep layers are responsible for reconstruction. Language and phoneme features are crucial for multilingual ASR. However, data2vec's masked representation generation relies on multi-layer averaging, inevitably coupling these features. To address this limitation, we propose a decoupling quantization based data2vec (DQ-Data2vec) for multilingual ASR, which includes a data2vec backbone and two improved online K-means quantizers. Our core idea is using the K-means quantizer with specified cluster numbers to decouple language and phoneme information for masked prediction. Specifically, in the language quantization, considering that the number of languages is significantly different from other irrelevant features (e.g., speakers), we assign the cluster number to match the number of languages, explicitly decoupling shallow layers' language-related information from irrelevant features. This strategy is also applied to decoupling middle layers' phoneme and word features. In a self-supervised scenario, experiments on the CommonVoice dataset demonstrate that DQ-Data2vec achieves a relative reduction of 9.51% in phoneme error rate (PER) and 11.58% in word error rate (WER) compared to data2vec and UniData2vec. Moreover, in a weakly-supervised scenario incorporating language labels and high-resource language text labels, the relative reduction is 18.09% and 1.55%, respectively.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Authors:
Xuelong Geng,
Kun Wei,
Qijie Shao,
Shuiyun Liu,
Zhennan Lin,
Zhixian Zhao,
Guojian Li,
Wenjie Tian,
Peikun Chen,
Yangze Li,
Pengcheng Guo,
Mingchen Shao,
Shuiyuan Wang,
Yuang Cao,
Chengyou Wang,
Tianyi Xu,
Yuhang Dai,
Xinfa Zhu,
Yue Li,
Li Zhang,
Lei Xie
Abstract:
Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover…
▽ More
Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover, the lack of transparency in training details creates additional barriers to further innovation. In this study, we present OSUM, an Open Speech Understanding Model designed to explore the potential of training SLUMs under constrained academic resources. The OSUM model combines a Whisper encoder with a Qwen2 LLM and supports a wide range of speech tasks, including speech recognition (ASR), speech recognition with timestamps (SRWT), vocal event detection (VED), speech emotion recognition (SER), speaking style recognition (SSR), speaker gender classification (SGC), speaker age prediction (SAP), and speech-to-text chat (STTC). By employing an ASR+X training strategy, OSUM achieves efficient and stable multi-task training by simultaneously optimizing ASR alongside target tasks. Beyond delivering strong performance, OSUM emphasizes transparency by providing openly available data preparation and training methodologies, offering valuable insights and practical guidance for the academic community. By doing so, we aim to accelerate research and innovation in advanced SULM technologies.
△ Less
Submitted 16 February, 2025; v1 submitted 22 January, 2025;
originally announced January 2025.
-
Infrastructure for AI Agents
Authors:
Alan Chan,
Kevin Wei,
Sihao Huang,
Nitarshan Rajkumar,
Elija Perrier,
Seth Lazar,
Gillian K. Hadfield,
Markus Anderljung
Abstract:
AI agents plan and execute interactions in open-ended environments. For example, OpenAI's Operator can use a web browser to do product comparisons and buy online goods. To facilitate beneficial interactions and mitigate harmful ones, much research focuses on directly modifying agent behaviour. For example, developers can train agents to follow user instructions. This focus on direct modifications…
▽ More
AI agents plan and execute interactions in open-ended environments. For example, OpenAI's Operator can use a web browser to do product comparisons and buy online goods. To facilitate beneficial interactions and mitigate harmful ones, much research focuses on directly modifying agent behaviour. For example, developers can train agents to follow user instructions. This focus on direct modifications is useful, but insufficient. We will also need external protocols and systems that shape how agents interact with institutions and other actors. For instance, agents will need more efficient protocols to communicate with each other and form agreements. In addition, attributing an agent's actions to a particular human or other legal entity can help to establish trust, and also disincentivize misuse. Given this motivation, we propose the concept of agent infrastructure: technical systems and shared protocols external to agents that are designed to mediate and influence their interactions with and impacts on their environments. Just as the Internet relies on protocols like HTTPS, our work argues that agent infrastructure will be similarly indispensable to ecosystems of agents. We identify three functions for agent infrastructure: 1) attributing actions, properties, and other information to specific agents, their users, or other actors; 2) shaping agents' interactions; and 3) detecting and remedying harmful actions from agents. We provide an incomplete catalog of research directions for such functions. For each direction, we include analysis of use cases, infrastructure adoption, relationships to existing (internet) infrastructure, limitations, and open questions. Making progress on agent infrastructure can prepare society for the adoption of more advanced agents.
△ Less
Submitted 16 May, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
Local US officials' views on the impacts and governance of AI: Evidence from 2022 and 2023 survey waves
Authors:
Sophia Hatz,
Noemi Dreksler,
Kevin Wei,
Baobao Zhang
Abstract:
This paper presents a survey of local US policymakers' views on the future impact and regulation of AI. Our survey provides insight into US policymakers' expectations regarding the effects of AI on local communities and the nation, as well as their attitudes towards specific regulatory policies. Conducted in two waves (2022 and 2023), the survey captures changes in attitudes following the release…
▽ More
This paper presents a survey of local US policymakers' views on the future impact and regulation of AI. Our survey provides insight into US policymakers' expectations regarding the effects of AI on local communities and the nation, as well as their attitudes towards specific regulatory policies. Conducted in two waves (2022 and 2023), the survey captures changes in attitudes following the release of ChatGPT and the subsequent surge in public awareness of AI. Local policymakers express a mix of concern, optimism, and uncertainty about AI's impacts, anticipating significant societal risks such as increased surveillance, misinformation, and political polarization, alongside potential benefits in innovation and infrastructure. Many also report feeling underprepared and inadequately informed to make AI-related decisions. On regulation, a majority of policymakers support government oversight and favor specific policies addressing issues such as data privacy, AI-related unemployment, and AI safety and fairness. Democrats show stronger and more consistent support for regulation than Republicans, but the latter experienced a notable shift towards majority support between 2022 and 2023. Our study contributes to understanding the perspectives of local policymakers-key players in shaping state and federal AI legislation-by capturing evolving attitudes, partisan dynamics, and their implications for policy formation. The findings highlight the need for capacity-building initiatives and bi-partisan coordination to mitigate policy fragmentation and build a cohesive framework for AI governance in the US.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
New Constraints on Axion Mediated Dipole-Dipole Interactions
Authors:
Zitong Xu,
Xing Heng,
Guoqing Tian,
Di Gong,
Lei Cong,
Wei Ji,
Dmitry Budker,
Kai Wei
Abstract:
The search for axions sits at the intersection of solving critical problems in fundamental physics, including the strong CP problem in QCD, uncovering the nature of dark matter, and understanding the origin of the universe's matter-antimatter asymmetry. The measurement of axion-mediated spin-dependent interactions offers a powerful approach for axion detection. However, it has long been restricted…
▽ More
The search for axions sits at the intersection of solving critical problems in fundamental physics, including the strong CP problem in QCD, uncovering the nature of dark matter, and understanding the origin of the universe's matter-antimatter asymmetry. The measurement of axion-mediated spin-dependent interactions offers a powerful approach for axion detection. However, it has long been restricted to regions outside the 'axion window' due to a significant trade-off: the need to effectively suppress the magnetic leakage from highly polarized spin sources while simultaneously detecting sub-femtotesla level exotic physics signals at sub-decimeter-scale distances. In this work, we report new experimental results on axion-mediated exotic spin-spin interactions using an iron-shielded SmCo$_5$ spin source in combination with a specially designed self-compensation comagnetometer. Employing a composite shielding structure, we achieved a suppression of the magnetic field by up to $10^{11}$. This enabled us to establish new constraints on the coupling between electrons and neutrons, surpassing previous experimental limits by more than 10000 times within the axion window. Furthermore, we also set strongest constraints on the coupling between electrons and protons. The proposed method holds substantial potential not only for advancing the search for new physics beyond the Standard Model but also for enabling transformative applications in biological and chemical research.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation
Authors:
Xiaoxian Yang,
Qi Wang,
Kaiqi Zhang,
Ke Wei,
Jun Lyu,
Lingchao Chen
Abstract:
Ultrasound imaging frequently encounters challenges, such as those related to elevated noise levels, diminished spatiotemporal resolution, and the complexity of anatomical structures. These factors significantly hinder the model's ability to accurately capture and analyze structural relationships and dynamic patterns across various regions of the heart. Mamba, an emerging model, is one of the most…
▽ More
Ultrasound imaging frequently encounters challenges, such as those related to elevated noise levels, diminished spatiotemporal resolution, and the complexity of anatomical structures. These factors significantly hinder the model's ability to accurately capture and analyze structural relationships and dynamic patterns across various regions of the heart. Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks. To this end, this paper introduces a U-shaped deep learning model incorporating a large-window Mamba scale (LMS) module and a hierarchical feature fusion approach for echocardiographic segmentation. First, a cascaded residual block serves as an encoder and is employed to incrementally extract multiscale detailed features. Second, a large-window multiscale mamba module is integrated into the decoder to capture global dependencies across regions and enhance the segmentation capability for complex anatomical structures. Furthermore, our model introduces auxiliary losses at each decoder layer and employs a dual attention mechanism to fuse multilayer features both spatially and across channels. This approach enhances segmentation performance and accuracy in delineating complex anatomical structures. Finally, the experimental results using the EchoNet-Dynamic and CAMUS datasets demonstrate that the model outperforms other methods in terms of both accuracy and robustness. For the segmentation of the left ventricular endocardium (${LV}_{endo}$), the model achieved optimal values of 95.01 and 93.36, respectively, while for the left ventricular epicardium (${LV}_{epi}$), values of 87.35 and 87.80, respectively, were achieved. This represents an improvement ranging between 0.54 and 1.11 compared with the best-performing model.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Experimental secure entanglement-free quantum remote sensing over 50 km of optical fiber
Authors:
Wenjie He,
Chunfeng Huang,
Rui Guan,
Ye Chen,
Zhenrong Zhang,
Kejin Wei
Abstract:
Secure quantum remote sensing (SQRS) uses quantum states to gather information about distant objects or environments while ensuring secure data transmission against eavesdropping. It has potential applications in various fields, including environmental monitoring, military surveillance, and disaster response, where both data accuracy and transmission security are critical. Recent experiments have…
▽ More
Secure quantum remote sensing (SQRS) uses quantum states to gather information about distant objects or environments while ensuring secure data transmission against eavesdropping. It has potential applications in various fields, including environmental monitoring, military surveillance, and disaster response, where both data accuracy and transmission security are critical. Recent experiments have demonstrated the feasibility of SQRS using entanglement states. Here, we experimentally demonstrate an SQRS that can estimate a phase without requiring entanglement, offering the practical advantage that single-qubit states are easier to prepare. We successfully estimate the preset phase information at a remote site over a fiber distance of 50 km, which serves as a key step toward long-distance applications.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
Probing the soft rescattering parameters in $B$ decays involving a scalar meson with QCD factorization
Authors:
Jing-Juan Qi,
Zhen-Yang Wang,
Zhen-Hua Zhang,
Ke-Wei Wei,
Xin-Heng Guo
Abstract:
In this work, the soft rescattering parameters in the $B^\pm\rightarrow π^\pmπ^+π^-$ and $B^\pm\rightarrow K^\pmπ^+π^-$ decays with the light scalar meson $f_0(500)$ as the intermediate resonance are studied within the QCD factorization. Considering the interference effect between $ρ(770)^0$ and $f_0(500)$, we utilize the experimentally more direct event yields for fitting and get the soft rescatt…
▽ More
In this work, the soft rescattering parameters in the $B^\pm\rightarrow π^\pmπ^+π^-$ and $B^\pm\rightarrow K^\pmπ^+π^-$ decays with the light scalar meson $f_0(500)$ as the intermediate resonance are studied within the QCD factorization. Considering the interference effect between $ρ(770)^0$ and $f_0(500)$, we utilize the experimentally more direct event yields for fitting and get the soft rescattering parameters $|ρ_k^{SP}|=3.29\pm1.01$ and $|ρ_k^{PS}|=2.33\pm0.73$ in $B\rightarrow PS$ and $B\rightarrow SP$ decays ($P$ and $S$ denote pseudoscalar and scalar mesons, respectively), respectively. We also study the branching ratios and $CP$ asymmetries in the decay modes involving other scalar mesons, including $f_0(980)$, $a_0(980)$, $a_0(1450)$ and $K_0^*(1430)$, to test the rationality of the values of $|ρ_k^{SP}|$ and $|ρ_k^{PS}|$. Meanwhile, the wealth of experimental data facilitate the examination of the forward-backward asymmetry induced $CP$ asymmetries (FB-CPAs), and the localized $CP$ asymmetries (LACPs). We investigate these asymmetries resulting from the interference between $ρ(770)^0$ and $f_0(500)$ for $B^\pm\rightarrow π^\pmπ^+π^-$ and $B^\pm\rightarrow K^\pmπ^+π^-$ decays when the invariant mass of $π^+π^-$ locates in the low-energy region $0.445\mathrm{GeV}<m_{ππ}<0.795\mathrm{GeV}$. Our theoretical results of FB-CPAs and LACPs align with the experimental findings. We propose that the interference between $ρ(770)^0$ and $f_0(500)$ can be extended to other beauty and charmed mesons decays.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Dimensionality reduction for closed-loop quantum gate calibration
Authors:
Emma Berger,
Vivek Maurya,
Z. M. McIntyre,
Ken Xuan Wei,
Holger Haas,
Daniel Puzzuoli
Abstract:
Numerical gate design typically makes use of high-dimensional parameterizations enabling sophisticated, highly expressive control pulses. Developing efficient experimental calibration methods for such gates is a long-standing challenge in quantum control, as on-device calibration requires the optimization of noisy experimental data over high-dimensional parameter spaces. To improve the efficiency…
▽ More
Numerical gate design typically makes use of high-dimensional parameterizations enabling sophisticated, highly expressive control pulses. Developing efficient experimental calibration methods for such gates is a long-standing challenge in quantum control, as on-device calibration requires the optimization of noisy experimental data over high-dimensional parameter spaces. To improve the efficiency of calibrations, we present a systematic method for reducing the dimensionality of the parameter space traversed in gate calibration, starting from an arbitrary high-dimensional pulse representation. We use this approach to design and calibrate an $X_{π/2}$ gate robust against amplitude and detuning errors, as well as an $X_{π/2}$ gate robust against coherent errors due to a spectator qubit.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Dual-species Optical tweezer for Rb and K atoms
Authors:
Yangbo Wei,
Kedi Wei,
Shangjin Li,
Bo Yan
Abstract:
The optical tweezer experiment with neutral atoms is a focal topic in cold atom physics due to its significant potential in quantum computing and simulation. Here, we present the realization of a dual-species optical tweezer for both Rb and K atoms, marking the first step towards creating a polar molecule optical tweezer array. Initially, Rb and K atoms are collected using a dual magneto-optical t…
▽ More
The optical tweezer experiment with neutral atoms is a focal topic in cold atom physics due to its significant potential in quantum computing and simulation. Here, we present the realization of a dual-species optical tweezer for both Rb and K atoms, marking the first step towards creating a polar molecule optical tweezer array. Initially, Rb and K atoms are collected using a dual magneto-optical trap (MOT) and further cooled to 7 $μ$K for Rb and 10 $μ$K for K. By employing 850 nm tweezer beams, we demonstrate the ability to capture individual Rb or K atoms. The filling ratios of Rb and K can be finely adjusted by controlling the atomic densities of both species. Utilizing the post-selection technique, we can create a deterministic array of two-species atoms, paving the way for future polar molecule array formation.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
How Do AI Companies "Fine-Tune" Policy? Examining Regulatory Capture in AI Governance
Authors:
Kevin Wei,
Carson Ezell,
Nick Gabrieli,
Chinmay Deshpande
Abstract:
Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI develope…
▽ More
Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI developers and deployers could hinder such regulatory goals as ensuring the safety, fairness, beneficence, transparency, or innovation of general-purpose AI systems. In this paper, we first introduce different models of regulatory capture from the social science literature. We then present results from interviews with 17 AI policy experts on what policy outcomes could compose regulatory capture in US AI policy, which AI industry actors are influencing the policy process, and whether and how AI industry actors attempt to achieve outcomes of regulatory capture. Experts were primarily concerned with capture leading to a lack of AI regulation, weak regulation, or regulation that over-emphasizes certain policy goals over others. Experts most commonly identified agenda-setting (15 of 17 interviews), advocacy (13), academic capture (10), information management (9), cultural capture through status (7), and media capture (7) as channels for industry influence. To mitigate these particular forms of industry influence, we recommend systemic changes in developing technical expertise in government and civil society, independent funding streams for the AI ecosystem, increased transparency and ethics requirements, greater civil society access to policy, and various procedural safeguards.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Constraining the Fifth Force Using the Earth as a Spin and Mass Source from the Chinese Space Station
Authors:
Zheng-Ting Lai,
Jun-Xu Lu,
Li-Sheng Geng,
Kai Wei,
Wei Ji
Abstract:
We explore the potential of conducting an experiment on the Chinese Space Station (CSS) to constrain beyond-the-standard-model (BSM) long-range spin- and velocity-dependent interactions, which are mediated by the exchange of an ultralight $\left(m_{z^{\prime}}<10^{-10}\text{eV}\right)$ or massless intermediate vector boson. We demonstrate that the proposed experiment on the CSS offers several adva…
▽ More
We explore the potential of conducting an experiment on the Chinese Space Station (CSS) to constrain beyond-the-standard-model (BSM) long-range spin- and velocity-dependent interactions, which are mediated by the exchange of an ultralight $\left(m_{z^{\prime}}<10^{-10}\text{eV}\right)$ or massless intermediate vector boson. We demonstrate that the proposed experiment on the CSS offers several advantages compared to ground-based experiments. The high speed can enhance the sensitivity to velocity-dependent interactions. The periodicity allows efficient extraction of signals from background noises, thereby strengthening the experiment's accuracy. Combining these advantages, one can improve the existing bounds on such interactions by up to five orders of magnitude. With advancements in sensor technology, we anticipate a further enhancement of four orders of magnitude, resulting in a total potential improvement of up to nine orders of magnitude.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark
Authors:
Hasnat Md Abdullah,
Tian Liu,
Kangda Wei,
Shu Kong,
Ruihong Huang
Abstract:
Localizing unusual activities, such as human errors or surveillance incidents, in videos holds practical significance. However, current video understanding models struggle with localizing these unusual events likely because of their insufficient representation in models' pretraining datasets. To explore foundation models' capability in localizing unusual activity, we introduce UAL-Bench, a compreh…
▽ More
Localizing unusual activities, such as human errors or surveillance incidents, in videos holds practical significance. However, current video understanding models struggle with localizing these unusual events likely because of their insufficient representation in models' pretraining datasets. To explore foundation models' capability in localizing unusual activity, we introduce UAL-Bench, a comprehensive benchmark for unusual activity localization, featuring three video datasets: UAG-OOPS, UAG-SSBD, UAG-FunQA, and an instruction-tune dataset: OOPS-UAG-Instruct, to improve model capabilities. UAL-Bench evaluates three approaches: Video-Language Models (Vid-LLMs), instruction-tuned Vid-LLMs, and a novel integration of Vision-Language Models and Large Language Models (VLM-LLM). Our results show the VLM-LLM approach excels in localizing short-span unusual events and predicting their onset (start time) more accurately than Vid-LLMs. We also propose a new metric, R@1, TD <= p, to address limitations in existing evaluation methods. Our findings highlight the challenges posed by long-duration videos, particularly in autism diagnosis scenarios, and the need for further advancements in localization techniques. Our work not only provides a benchmark for unusual activity localization but also outlines the key challenges for existing foundation models, suggesting future research directions on this important task.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Authors:
Bingshen Mu,
Kun Wei,
Qijie Shao,
Yong Xu,
Lei Xie
Abstract:
Recent advancements in integrating Large Language Models (LLM) with automatic speech recognition (ASR) have performed remarkably in general domains. While supervised fine-tuning (SFT) of all model parameters is often employed to adapt pre-trained LLM-based ASR models to specific domains, it imposes high computational costs and notably reduces their performance in general domains. In this paper, we…
▽ More
Recent advancements in integrating Large Language Models (LLM) with automatic speech recognition (ASR) have performed remarkably in general domains. While supervised fine-tuning (SFT) of all model parameters is often employed to adapt pre-trained LLM-based ASR models to specific domains, it imposes high computational costs and notably reduces their performance in general domains. In this paper, we propose a novel parameter-efficient multi-domain fine-tuning method for adapting pre-trained LLM-based ASR models to multi-accent domains without catastrophic forgetting named \textit{HDMoLE}, which leverages hierarchical routing and dynamic thresholds based on combining low-rank adaptation (LoRA) with the mixer of experts (MoE) and can be generalized to any linear layer. Hierarchical routing establishes a clear correspondence between LoRA experts and accent domains, improving cross-domain collaboration among the LoRA experts. Unlike the static Top-K strategy for activating LoRA experts, dynamic thresholds can adaptively activate varying numbers of LoRA experts at each MoE layer. Experiments on the multi-accent and standard Mandarin datasets demonstrate the efficacy of HDMoLE. Applying HDMoLE to an LLM-based ASR model projector module achieves similar performance to full fine-tuning in the target multi-accent domains while using only 9.6% of the trainable parameters required for full fine-tuning and minimal degradation in the source general domain.
△ Less
Submitted 3 January, 2025; v1 submitted 29 September, 2024;
originally announced September 2024.
-
Dynamically Optimized Nonadiabatic Holonomic Quantum Computation
Authors:
Hai Xu,
Wanchun Li,
Tao Chen,
Kejin Wei,
Chengxian Zhang
Abstract:
Nonadiabatic holonomic quantum computation (NHQC) is one of the promising approaches to realizing fault-tolerant quantum computation. However, due to the imperfect control in the experimental environments, the holonomic gate still needs to be further improved. Here, we propose a dynamically optimized NHQC (OPNHQC) scheme based on dynamically corrected gate technique. The scheme is implemented by c…
▽ More
Nonadiabatic holonomic quantum computation (NHQC) is one of the promising approaches to realizing fault-tolerant quantum computation. However, due to the imperfect control in the experimental environments, the holonomic gate still needs to be further improved. Here, we propose a dynamically optimized NHQC (OPNHQC) scheme based on dynamically corrected gate technique. The scheme is implemented by carefully designing a sequence of elementary pulses to fulfill cyclic evolution, while the dynamical phase is not accumulated. In this way, the constructed holonomic gate is immune to the error. It is found that our scheme can correct the $X$ error up to fourth order. In addition, combining with the DFS encoding our scheme can be immune to both the $X$ and $Z$ errors. Therefore, our proposed scheme offers a prospective way to the realization of scalable fault-tolerant holonomic quantum computation.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
DNN-based Enhanced DOA Sensing via Massive MIMO Receiver with Switches-based Hybrid Architecture
Authors:
Yifan Li,
Kang Wei,
Linqiong Jia,
Jun Zou,
Feng Shu,
Yaoliang Song,
Jiangzhou Wang
Abstract:
Switches-based hybrid architecture has attracted much attention, especially in directional-of-arrival (DOA) sensing, due to its ability of significantly reducing the hardware cost by compressing massive multiple-input multiple-output (MIMO) arrays with switching networks. However, this structure will lead to a degradation in the degrees of freedom (DOF) and accuracy of DOA estimation. To address t…
▽ More
Switches-based hybrid architecture has attracted much attention, especially in directional-of-arrival (DOA) sensing, due to its ability of significantly reducing the hardware cost by compressing massive multiple-input multiple-output (MIMO) arrays with switching networks. However, this structure will lead to a degradation in the degrees of freedom (DOF) and accuracy of DOA estimation. To address these two issues, we first propose a switches-based sparse hybrid array (SW-SHA). In this method, we design a dynamic switching network to form a synthesized sparse array, i.e., SW-SHA, that can enlarge the virtual aperture obtained by the difference co-array, thereby significantly enhancing the DOF. Second, in order to improve the DOA estimation accuracy of switches-based hybrid arrays, a deep neural network (DNN)-based method called ASN-DNN is proposed. It includes an antenna selection network (ASN) for optimizing the switch connections based on the criterion of minimizing the Cramer-Rao lower bound (CRLB) under the peak sidelobe level (PSL) constraint and a DNN for DOA estimation. Then by integrating ASN and DNN into an iterative process, the ASN-DNN is obtained. Furthermore, the closed-form expression of CRLB for DOA estimation is derived to evaluate the performance lower bound of switches-based hybrid arrays and provide a benchmark for ASN-DNN. The simulation results show the proposed ASN-DNN can achieve a greater performance than traditional methods, especially in the low signal-to-noise ratio (SNR) regions.
△ Less
Submitted 13 January, 2025; v1 submitted 21 September, 2024;
originally announced September 2024.
-
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Authors:
Hongfei Xue,
Wei Ren,
Xuelong Geng,
Kun Wei,
Longhao Li,
Qijie Shao,
Linju Yang,
Kai Diao,
Lei Xie
Abstract:
Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech translation (AST). However, these methods often overlook the critical aspect of language adaptation in multilingual settings, relying instead on multilingual data…
▽ More
Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech translation (AST). However, these methods often overlook the critical aspect of language adaptation in multilingual settings, relying instead on multilingual data without adequately addressing language differences. To address this gap, we propose the Ideal-LLM model, which employs dual multilingual encoders to enrich language feature information and utilizes a language-adapted connector to target the adaptation of each language specifically. By leveraging the complementary strengths of Whisper and MMS encoders, our approach ensures richer multilingual representations. Additionally, the language-adapted connector enhances modal transformation via a language weight selector tailored for each language. Experimental results demonstrate that Ideal-LLM significantly improves ASR performance, achieving a 32.6% relative reduction in average word error rates compared to the standard speech encoder integrated with LLMs and yields an average BLEU score of 36.78 for AST task.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Towards Single-Lens Controllable Depth-of-Field Imaging via Depth-Aware Point Spread Functions
Authors:
Xiaolong Qian,
Qi Jiang,
Yao Gao,
Shaohua Gao,
Zhonghua Yi,
Lei Sun,
Kai Wei,
Haifeng Li,
Kailun Yang,
Kaiwei Wang,
Jian Bai
Abstract:
Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable Do…
▽ More
Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable DoF, for achieving single-lens controllable DoF imaging via computational methods. A Depth-aware Controllable DoF Imaging (DCDI) framework is proposed equipped with All-in-Focus (AiF) aberration correction and monocular depth estimation, where the recovered image and corresponding depth map are utilized to produce imaging results under diverse DoFs of any high-end lens via patch-wise convolution. To address the depth-varying optical degradation, we introduce a Depth-aware Degradation-adaptive Training (DA2T) scheme. At the dataset level, a Depth-aware Aberration MOS (DAMOS) dataset is established based on the simulation of Point Spread Functions (PSFs) under different object distances. Additionally, we design two plug-and-play depth-aware mechanisms to embed depth information into the aberration image recovery for better tackling depth-aware degradation. Furthermore, we propose a storage-efficient Omni-Lens-Field model to represent the 4D PSF library of various lenses. With the predicted depth map, recovered image, and depth-aware PSF map inferred by Omni-Lens-Field, single-lens controllable DoF imaging is achieved. Comprehensive experimental results demonstrate that the proposed framework enhances the recovery performance, and attains impressive single-lens controllable DoF imaging results, providing a seminal baseline for this field. The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI.
△ Less
Submitted 11 February, 2025; v1 submitted 15 September, 2024;
originally announced September 2024.
-
Geometric two-qubit gates in silicon-based double quantum dots
Authors:
Yong-Yang Lu,
Kejin Wei,
Chengxian Zhang
Abstract:
Achieving high-fidelity two-qubit gates is crucial for spin qubits in silicon double quantum dots. However, the two-qubit gates in experiments are easily suffered from charge noise, which is still a key challenge. Geometric gates which implement gate operations employing pure geometric phase are believed to be a powerful way to realize robust control. In this work, we theoretically propose feasibl…
▽ More
Achieving high-fidelity two-qubit gates is crucial for spin qubits in silicon double quantum dots. However, the two-qubit gates in experiments are easily suffered from charge noise, which is still a key challenge. Geometric gates which implement gate operations employing pure geometric phase are believed to be a powerful way to realize robust control. In this work, we theoretically propose feasible strategy to implement geometric two-qubit gates for silicon-based spin qubits considering experimental control environments. By working in the suitable region where the local magnetic field gradient is much larger than the exchange interaction, we are able to implement entangling and non-entangling geometric gates via analytical and numerical methods. It is found that the implemented geometric gates can obtain fidelities surpassing 99\% for the noise level related to the experiments. Also, they can outperform the dynamical opertations. Our work paves a way to implement high-fidelity geometric gate for spin qubits in silicon.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.