-
Leveraging erasure errors in logical qubits with metastable $^{171}$Yb atoms
Authors:
Bichen Zhang,
Genyue Liu,
Guillaume Bornet,
Sebastian P. Horvath,
Pai Peng,
Shuo Ma,
Shilin Huang,
Shruti Puri,
Jeff D. Thompson
Abstract:
Implementing large-scale quantum algorithms with practical advantage will require fault-tolerance achieved through quantum error correction, but the associated overhead is a significant cost. The overhead can be reduced by engineering physical qubits with fewer errors, and by shaping the residual errors to be more easily correctable. In this work, we demonstrate quantum error correcting codes and…
▽ More
Implementing large-scale quantum algorithms with practical advantage will require fault-tolerance achieved through quantum error correction, but the associated overhead is a significant cost. The overhead can be reduced by engineering physical qubits with fewer errors, and by shaping the residual errors to be more easily correctable. In this work, we demonstrate quantum error correcting codes and logical qubit circuits in a metastable ${}^{171}$Yb qubit with a noise bias towards erasure errors, that is, errors whose location can be detected separate from any syndrome information. We show that dephasing errors on the nuclear spin qubit during coherent transport can be strongly suppressed, and implement robust entangling gates that maintain a high fidelity in the presence of gate beam inhomogeneity or pointing error. We demonstrate logical qubit encoding in the $[[4,2,2]]$ code, with error correction during decoding based on mid-circuit erasure measurements despite the fact that the code is too small to correct any Pauli errors. Finally, we demonstrate logical qubit teleportation between multiple code blocks with conditionally selected ancillas based on mid-circuit erasure checks, which is a key ingredient for leakage-robust error correction with neutral atoms.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Mapping Human-Agent Co-Learning and Co-Adaptation: A Scoping Review
Authors:
Shruti Kumar,
Xiaoyu Chen,
Xiaomei Wang
Abstract:
Several papers have delved into the challenges of human-AI-robot co-learning and co-adaptation. It has been noted that the terminology used to describe this collaborative relationship in existing studies needs to be more consistent. For example, the prefix "co" is used interchangeably to represent both "collaborative" and "mutual," and the terms "co-learning" and "co-adaptation" are sometimes used…
▽ More
Several papers have delved into the challenges of human-AI-robot co-learning and co-adaptation. It has been noted that the terminology used to describe this collaborative relationship in existing studies needs to be more consistent. For example, the prefix "co" is used interchangeably to represent both "collaborative" and "mutual," and the terms "co-learning" and "co-adaptation" are sometimes used interchangeably. However, they can reflect subtle differences in the focus of the studies. The current scoping review's primary research question (RQ1) aims to gather existing papers discussing this collaboration pattern and examine the terms researchers use to describe this human-agent relationship. Given the relative newness of this area of study, we are also keen on exploring the specific types of intelligent agents and task domains that have been considered in existing research (RQ2). This exploration is significant as it can shed light on the diversity of human-agent interactions, from one-time to continuous learning/adaptation scenarios. It can also help us understand the dynamics of human-agent interactions in different task domains, guiding our expectations towards research situated in dynamic, complex domains. Our third objective (RQ3) is to investigate the cognitive theories and frameworks that have been utilized in existing studies to measure human-agent co-learning and co-adaptation. This investigation is crucial as it can help us understand the theoretical underpinnings of human-agent collaboration and adaptation, and it can also guide us in identifying any new frameworks proposed specifically for this type of relationship.
△ Less
Submitted 29 May, 2025;
originally announced June 2025.
-
A Unitary Encoder for Surface Codes
Authors:
Pei-Kai Tsai,
Shruti Puri
Abstract:
The surface code is a promising candidate for fault-tolerant quantum computation and has been implemented in many quantum hardware platforms. In this work, we propose a new non-local unitary circuit to encode a surface code state based on a code conversion between rotated and regular surface codes, which halves the gate count of the fastest encoder known previously. While the unitary encoders can…
▽ More
The surface code is a promising candidate for fault-tolerant quantum computation and has been implemented in many quantum hardware platforms. In this work, we propose a new non-local unitary circuit to encode a surface code state based on a code conversion between rotated and regular surface codes, which halves the gate count of the fastest encoder known previously. While the unitary encoders can be used to increase the code distance, the fault-distance remains fixed. Nonetheless, they can be used for space-time efficient realization of eigenstates of the surface code operators that can't be easily accessed transversally such as the Pauli Y-eignestate and Clifford eigenstates. It may be expected that error propagation in the non-local circuit will make decoding more challenging compared to local unitary encoding circuits. However, we find this not to be the case and that conventional matching decoders can be effectively used. Furthermore, we perform numerical simulations to benchmark the performance of our encoder against a previous local unitary encoder and the conventional stabilizer-measurement based encoder for preparing the Pauli Y-eigenstate and find that our encoder can outperform these in experimentally relevant noise regimes. Therefore, our encoder provides practical advantage in platforms where non-local interactions are available such as neutral atoms and trapped ions.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Overcoming Challenges of Partial Client Participation in Federated Learning : A Comprehensive Review
Authors:
Mrinmay Sen,
Shruti Aparna,
Rohit Agarwal,
Chalavadi Krishna Mohan
Abstract:
Federated Learning (FL) is a learning mechanism that falls under the distributed training umbrella, which collaboratively trains a shared global model without disclosing the raw data from different clients. This paper presents an extensive survey on the impact of partial client participation in federated learning. While much of the existing research focuses on addressing issues such as generalizat…
▽ More
Federated Learning (FL) is a learning mechanism that falls under the distributed training umbrella, which collaboratively trains a shared global model without disclosing the raw data from different clients. This paper presents an extensive survey on the impact of partial client participation in federated learning. While much of the existing research focuses on addressing issues such as generalization, robustness, and fairness caused by data heterogeneity under the assumption of full client participation, limited attention has been given to the practical and theoretical challenges arising from partial client participation, which is common in real-world scenarios. This survey provides an in-depth review of existing FL methods designed to cope with partial client participation. We offer a comprehensive analysis supported by theoretical insights and empirical findings, along with a structured categorization of these methods, highlighting their respective advantages and disadvantages.
△ Less
Submitted 6 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
Securing AI Agents with Information-Flow Control
Authors:
Manuel Costa,
Boris Köpf,
Aashish Kolluri,
Andrew Paverd,
Mark Russinovich,
Ahmed Salem,
Shruti Tople,
Lukas Wutschitz,
Santiago Zanella-Béguelin
Abstract:
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of prop…
▽ More
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach broadens the range of tasks that can be securely accomplished. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Can Modern NLP Systems Reliably Annotate Chest Radiography Exams? A Pre-Purchase Evaluation and Comparative Study of Solutions from AWS, Google, Azure, John Snow Labs, and Open-Source Models on an Independent Pediatric Dataset
Authors:
Shruti Hegde,
Mabon Manoj Ninan,
Jonathan R. Dillman,
Shireen Hayatghaibi,
Lynn Babcock,
Elanchezhian Somasundaram
Abstract:
General-purpose clinical natural language processing (NLP) tools are increasingly used for the automatic labeling of clinical reports. However, independent evaluations for specific tasks, such as pediatric chest radiograph (CXR) report labeling, are limited. This study compares four commercial clinical NLP systems - Amazon Comprehend Medical (AWS), Google Healthcare NLP (GC), Azure Clinical NLP (A…
▽ More
General-purpose clinical natural language processing (NLP) tools are increasingly used for the automatic labeling of clinical reports. However, independent evaluations for specific tasks, such as pediatric chest radiograph (CXR) report labeling, are limited. This study compares four commercial clinical NLP systems - Amazon Comprehend Medical (AWS), Google Healthcare NLP (GC), Azure Clinical NLP (AZ), and SparkNLP (SP) - for entity extraction and assertion detection in pediatric CXR reports. Additionally, CheXpert and CheXbert, two dedicated chest radiograph report labelers, were evaluated on the same task using CheXpert-defined labels. We analyzed 95,008 pediatric CXR reports from a large academic pediatric hospital. Entities and assertion statuses (positive, negative, uncertain) from the findings and impression sections were extracted by the NLP systems, with impression section entities mapped to 12 disease categories and a No Findings category. CheXpert and CheXbert extracted the same 13 categories. Outputs were compared using Fleiss Kappa and accuracy against a consensus pseudo-ground truth. Significant differences were found in the number of extracted entities and assertion distributions across NLP systems. SP extracted 49,688 unique entities, GC 16,477, AZ 31,543, and AWS 27,216. Assertion accuracy across models averaged around 62%, with SP highest (76%) and AWS lowest (50%). CheXpert and CheXbert achieved 56% accuracy. Considerable variability in performance highlights the need for careful validation and review before deploying NLP tools for clinical report labeling.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer
Authors:
Yumeng Zhang,
Zohaib Salahuddin,
Danial Khan,
Shruti Atul Mali,
Henry C. Woodruff,
Sina Amirrajab,
Eduardo Ibor-Crespo,
Ana Jimenez-Pastor,
Luis Marti-Bonmati,
Philippe Lambin
Abstract:
Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and M…
▽ More
Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and MFI on axial and sagittal T2-weighted MRI. Methods: This retrospective study used 331 pre-treatment rectal cancer MRI examinations from three European hospitals. After TotalSegmentator-guided rectal patch extraction, a self-supervised frequency-domain harmonization pipeline was trained to minimize scanner-related contrast shifts. Four classifiers were compared: ResNet50, SeResNet, the universal biomedical pretrained transformer (UMedPT) with a lightweight MLP head, and a logistic-regression variant using frozen UMedPT features (UMedPT_LR). Results: UMedPT_LR achieved the best EVI detection when axial and sagittal features were fused (AUC = 0.82; sensitivity = 0.75; F1 score = 0.73), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.74). The highest MFI performance was attained by UMedPT on axial harmonized images (AUC = 0.77), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.75). Frequency-domain harmonization improved MFI classification but variably affected EVI performance. Conventional CNNs (ResNet50, SeResNet) underperformed, especially in F1 score and balanced accuracy. Conclusion: These findings demonstrate that combining foundation model features, harmonization, and multi-view fusion significantly enhances diagnostic performance in rectal MRI.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Explainable Anatomy-Guided AI for Prostate MRI: Foundation Models and In Silico Clinical Trials for Virtual Biopsy-based Risk Assessment
Authors:
Danial Khan,
Zohaib Salahuddin,
Yumeng Zhang,
Sheng Kuang,
Shruti Atul Mali,
Henry C. Woodruff,
Sina Amirrajab,
Rachel Cavill,
Eduardo Ibor-Crespo,
Ana Jimenez-Pastor,
Adrian Galiana-Bordera,
Paula Jimenez Gomez,
Luis Marti-Bonmati,
Philippe Lambin
Abstract:
We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatom…
▽ More
We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatomical priors and clinical data; and a VAE-GAN framework for generating counterfactual heatmaps that localize decision-driving image regions. The system was developed using 1,500 PI-CAI cases for segmentation and 617 biparametric MRIs with metadata from the CHAIMELEON challenge for classification (split into 70% training, 10% validation, and 20% testing). Segmentation achieved mean Dice scores of 0.95 (gland), 0.94 (peripheral zone), and 0.92 (transition zone). Incorporating gland priors improved AUC from 0.69 to 0.72, with a three-scale ensemble achieving top performance (AUC = 0.79, composite score = 0.76), outperforming the 2024 CHAIMELEON challenge winners. Counterfactual heatmaps reliably highlighted lesions within segmented regions, enhancing model interpretability. In a prospective multi-center in-silico trial with 20 clinicians, AI assistance increased diagnostic accuracy from 0.72 to 0.77 and Cohen's kappa from 0.43 to 0.53, while reducing review time per case by 40%. These results demonstrate that anatomy-aware foundation models with counterfactual explainability can enable accurate, interpretable, and efficient PCa risk assessment, supporting their potential use as virtual biopsies in clinical practice.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Pixels to Prognosis: Harmonized Multi-Region CT-Radiomics and Foundation-Model Signatures Across Multicentre NSCLC Data
Authors:
Shruti Atul Mali,
Zohaib Salahuddin,
Danial Khan,
Yumeng Zhang,
Henry C. Woodruff,
Eduardo Ibor-Crespo,
Ana Jimenez-Pastor,
Luis Marti-Bonmati,
Philippe Lambin
Abstract:
Purpose: To evaluate the impact of harmonization and multi-region CT image feature integration on survival prediction in non-small cell lung cancer (NSCLC) patients, using handcrafted radiomics, pretrained foundation model (FM) features, and clinical data from a multicenter dataset.
Methods: We analyzed CT scans and clinical data from 876 NSCLC patients (604 training, 272 test) across five cente…
▽ More
Purpose: To evaluate the impact of harmonization and multi-region CT image feature integration on survival prediction in non-small cell lung cancer (NSCLC) patients, using handcrafted radiomics, pretrained foundation model (FM) features, and clinical data from a multicenter dataset.
Methods: We analyzed CT scans and clinical data from 876 NSCLC patients (604 training, 272 test) across five centers. Features were extracted from the whole lung, tumor, mediastinal nodes, coronary arteries, and coronary artery calcium (CAC). Handcrafted radiomics and FM deep features were harmonized using ComBat, reconstruction kernel normalization (RKN), and RKN+ComBat. Regularized Cox models predicted overall survival; performance was assessed using the concordance index (C-index), 5-year time-dependent area under the curve (t-AUC), and hazard ratio (HR). SHapley Additive exPlanations (SHAP) values explained feature contributions. A consensus model used agreement across top region of interest (ROI) models to stratify patient risk.
Results: TNM staging showed prognostic utility (C-index = 0.67; HR = 2.70; t-AUC = 0.85). The clinical + tumor radiomics model with ComBat achieved a C-index of 0.7552 and t-AUC of 0.8820. FM features (50-voxel cubes) combined with clinical data yielded the highest performance (C-index = 0.7616; t-AUC = 0.8866). An ensemble of all ROIs and FM features reached a C-index of 0.7142 and t-AUC of 0.7885. The consensus model, covering 78% of valid test cases, achieved a t-AUC of 0.92, sensitivity of 97.6%, and specificity of 66.7%.
Conclusion: Harmonization and multi-region feature integration improve survival prediction in multicenter NSCLC data. Combining interpretable radiomics, FM features, and consensus modeling enables robust risk stratification across imaging centers.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)
Authors:
Clayton Cohn,
Surya Rayala,
Caitlin Snyder,
Joyce Fonteles,
Shruti Jain,
Naveeduddin Mohammed,
Umesh Timalsina,
Sarah K. Burriss,
Ashwin T S,
Namrata Srivastava,
Menton Deweese,
Angela Eeds,
Gautam Biswas
Abstract:
Collaborative dialogue offers rich insights into students' learning and critical thinking. This is essential for adapting pedagogical agents to students' learning and problem-solving skills in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, potential hallucinations can undermine confidence, trust, and instructional value. Retrieval-augmented generat…
▽ More
Collaborative dialogue offers rich insights into students' learning and critical thinking. This is essential for adapting pedagogical agents to students' learning and problem-solving skills in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, potential hallucinations can undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated knowledge, but its effectiveness depends on clear semantic links between user input and a knowledge base, which are often weak in student dialogue. We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by incorporating environment logs to contextualize collaborative discourse. Our findings show that LC-RAG improves retrieval over a discourse-only baseline and allows our collaborative peer agent, Copa, to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in a collaborative computational modeling environment, XYZ.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
PLanet: Formalizing Experimental Design
Authors:
London Bielicke,
Anna Zhang,
Shruti Tyagi,
Emery Berger,
Adam Chlipala,
Eunice Jun
Abstract:
Carefully constructed experimental designs are essential for drawing valid, generalizable conclusions from scientific studies. Unfortunately, experimental design plans can be difficult to specify, communicate clearly, and relate to alternatives. In response, we introduce a grammar of experimental design that provides composable operators for constructing assignment procedures (e.g., Latin square).…
▽ More
Carefully constructed experimental designs are essential for drawing valid, generalizable conclusions from scientific studies. Unfortunately, experimental design plans can be difficult to specify, communicate clearly, and relate to alternatives. In response, we introduce a grammar of experimental design that provides composable operators for constructing assignment procedures (e.g., Latin square). We implement this grammar in PLanet, a domain-specific language (DSL) that constructs assignment plans in three stages: experimental unit specification, trial-order construction, and order-to-unit mapping. We evaluate PLanet's expressivity by taking a purposive sample of recent CHI and UIST publications, representing their experiments as programs in PLanet, and identifying ambiguities and alternatives. In our evaluation, PLanet could express 11 out of 12 experiments found in sampled papers. Additionally, we found that PLanet constructs helped make complex design choices explicit when the researchers omit technical language describing their study designs.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Smooth Splitting and Zeros from On-Shell Recursion
Authors:
Callum R. T. Jones,
Shruti Paranjape
Abstract:
We describe a new approach to understanding the origins of recently discovered "hidden zeros" and "smooth splitting" of tree-level amplitudes in $\text{Tr}φ^3$, Non-Linear Sigma Model (NLSM), Yang-Mill-Scalar (YMS) and the special Galileon. Introducing a new type of linear shift in kinematic space we demonstrate that the mysterious splitting formulae follow from a simple contour integration argume…
▽ More
We describe a new approach to understanding the origins of recently discovered "hidden zeros" and "smooth splitting" of tree-level amplitudes in $\text{Tr}φ^3$, Non-Linear Sigma Model (NLSM), Yang-Mill-Scalar (YMS) and the special Galileon. Introducing a new type of linear shift in kinematic space we demonstrate that the mysterious splitting formulae follow from a simple contour integration argument in the style of on-shell recursion. The argument makes use of only standard notions of tree-level factorization on propagators, but assumes improved UV behavior in the form of the absence of a residue at infinity. In the case of $\text{Tr}φ^3$ and NLSM this is proven by identifying our shift as a special case of a more general construction called a $g$-vector shift; in the case of YMS it remains an unproven conjecture. This recursive perspective leads to numerous new results: we derive generalizations of the splitting formulae on more relaxed near-zero kinematics, including interesting new kinematic limits in which the amplitude splits into a triple-product; we also demonstrate that the uncolored special Galileon model has improved UV scaling and hence also splits. We also investigate the possible realization of hidden zeros in four dimensions. The conditions under which the dimensionality constraints are compatible with zero kinematics is investigated in detail for $\text{Tr}φ^3$ and YMS; for the latter we find they can be realized only with certain restrictions on external helicity states. The realizable 4d zeros are proven by a similar recursive argument based on BCFW and is found to generalize to a new class of intrinsically 4d "helicity zeros" present in all sectors of YM and also gravity.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Optomechanical resource for fault-tolerant quantum computing
Authors:
Margaret Pavlovich,
Peter Rakich,
Shruti Puri
Abstract:
Fusion-based quantum computing with dual-rail qubits is a leading candidate for scalable quantum computing using linear optics. This paradigm requires single photons which are entangled into small resource states before being fed into a fusion network. The most common sources for single optical photons and for small entangled states are probabilistic and heralded. The realization of a single relia…
▽ More
Fusion-based quantum computing with dual-rail qubits is a leading candidate for scalable quantum computing using linear optics. This paradigm requires single photons which are entangled into small resource states before being fed into a fusion network. The most common sources for single optical photons and for small entangled states are probabilistic and heralded. The realization of a single reliable deterministic source requires many redundant probabilistic sources and a complex optical network for rerouting and retiming probabilistic outputs. In this work, we show how optomechanics enables reliable production of resources for photonic quantum computing without the redundancy of the all-optical approach. This is achieved by using acoustic modes as caches of quantum resources, ranging from single-particle states to small entangled states, with on-demand read-out. The advantages of acoustic modes as optical quantum memories, compared to other technologies, include their intrinsically long lifetimes and that they are solid state, highly tailorable, and insensitive to electromagnetic noise. We show how the resource states can be prepared directly in the acoustic modes using optical controls. This is still probabilistic and heralded, as in the all-optical approach, but the acoustic modes act as a quantum memory which is integrated into the production of the states. The quantum states may be deterministically transferred from acoustic modes to optical modes, on demand, with another optical drive.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Understanding the evolution of the magnetic ground state in Ba$_4$NaRu$_3$O$_{12}$
Authors:
Shruti Chakravarty,
Pascal Manuel,
Antonio Cervellino,
Sunil Nair
Abstract:
We report a comprehensive investigation of the quadruple perovskite Ba$_4$NaRu$_3$O$_{12}$, in which we discover a robust spin-lattice coupled ground state characterized by a long-range antiferromagnetic ordering at $T_N \sim$ 257 K. The system's unique structural motif of three symmetrically distinct magnetic ions, including Ru dimers separated by non-magnetic layers, is intimately correlated wit…
▽ More
We report a comprehensive investigation of the quadruple perovskite Ba$_4$NaRu$_3$O$_{12}$, in which we discover a robust spin-lattice coupled ground state characterized by a long-range antiferromagnetic ordering at $T_N \sim$ 257 K. The system's unique structural motif of three symmetrically distinct magnetic ions, including Ru dimers separated by non-magnetic layers, is intimately correlated with its magnetic behavior, as evidenced by temperature-dependent diffraction measurements and specific heat data. The powder neutron diffraction patterns at 13 K showed that the spins within the dimers are antiparallel, leading to a net zero moment contribution and a staggered arrangement of the triangular layers formed by the Ru moments within the corner-shared octahedra along the $c$-axis. The low-temperature specific heat revealed an extra boson peak contribution from optical modes with a maximum vibrational energy of $\sim$55cm$^{-1}$. The charge transport exhibited variable-range hopping (VRH) behaviour below $T_N$, with a stronger energy-dependence than expected from the Efros-Shklovskii model, suggesting the presence of multiparticle correlation effects.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Large deformations of Tr($Φ^3$) and the world at infinity
Authors:
Shruti Paranjape,
Marcos Skowronek,
Marcus Spradlin,
Anastasia Volovich
Abstract:
The amplitudes of the non-linear sigma model can be obtained from those of Tr($Φ^3$) theory by sending the kinematic (Mandelstam) variables to infinity in a certain direction. In this paper we characterize the behavior of Tr($Φ^3$) amplitudes under a general class of large kinematic shifts called $g$-vector shifts. The objects that live in this world at infinity retain certain key amplitude-like p…
▽ More
The amplitudes of the non-linear sigma model can be obtained from those of Tr($Φ^3$) theory by sending the kinematic (Mandelstam) variables to infinity in a certain direction. In this paper we characterize the behavior of Tr($Φ^3$) amplitudes under a general class of large kinematic shifts called $g$-vector shifts. The objects that live in this world at infinity retain certain key amplitude-like properties, most notably factorization, and admit descriptions in terms of polytopes, but they are not generally amplitudes of any cognizable theory. We identify particular $g$-vector shifts that lead at infinity to mixed amplitudes involving two pions and any number of scalars, allowing us to provide polytopal descriptions of these amplitudes.
△ Less
Submitted 29 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Towards Green AI-Native Networks: Evaluation of Neural Circuit Policy for Estimating Energy Consumption of Base Stations
Authors:
Selim Ickin,
Shruti Bothe,
Aman Raparia,
Nitin Khanna,
Erik Sanders
Abstract:
Optimization of radio hardware and AI-based network management software yield significant energy savings in radio access networks. The execution of underlying Machine Learning (ML) models, which enable energy savings through recommended actions, may require additional compute and energy, highlighting the opportunity to explore and adopt accurate and energy-efficient ML technologies. This work eval…
▽ More
Optimization of radio hardware and AI-based network management software yield significant energy savings in radio access networks. The execution of underlying Machine Learning (ML) models, which enable energy savings through recommended actions, may require additional compute and energy, highlighting the opportunity to explore and adopt accurate and energy-efficient ML technologies. This work evaluates the novel use of sparsely structured Neural Circuit Policies (NCPs) in a use case to estimate the energy consumption of base stations. Sparsity in ML models yields reduced memory, computation and energy demand, hence facilitating a low-cost and scalable solution. We also evaluate the generalization capability of NCPs in comparison to traditional and widely used ML models such as Long Short Term Memory (LSTM), via quantifying their sensitivity to varying model hyper-parameters (HPs). NCPs demonstrated a clear reduction in computational overhead and energy consumption. Moreover, results indicated that the NCPs are robust to varying HPs such as number of epochs and neurons in each layer, making them a suitable option to ease model management and to reduce energy consumption in Machine Learning Operations (MLOps) in telecommunications.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
The CosmoVerse White Paper: Addressing observational tensions in cosmology with systematics and fundamental physics
Authors:
Eleonora Di Valentino,
Jackson Levi Said,
Adam Riess,
Agnieszka Pollo,
Vivian Poulin,
Adrià Gómez-Valent,
Amanda Weltman,
Antonella Palmese,
Caroline D. Huang,
Carsten van de Bruck,
Chandra Shekhar Saraf,
Cheng-Yu Kuo,
Cora Uhlemann,
Daniela Grandón,
Dante Paz,
Dominique Eckert,
Elsa M. Teixeira,
Emmanuel N. Saridakis,
Eoin Ó Colgáin,
Florian Beutler,
Florian Niedermann,
Francesco Bajardi,
Gabriela Barenboim,
Giulia Gubitosi,
Ilaria Musella
, et al. (513 additional authors not shown)
Abstract:
The standard model of cosmology has provided a good phenomenological description of a wide range of observations both at astrophysical and cosmological scales for several decades. This concordance model is constructed by a universal cosmological constant and supported by a matter sector described by the standard model of particle physics and a cold dark matter contribution, as well as very early-t…
▽ More
The standard model of cosmology has provided a good phenomenological description of a wide range of observations both at astrophysical and cosmological scales for several decades. This concordance model is constructed by a universal cosmological constant and supported by a matter sector described by the standard model of particle physics and a cold dark matter contribution, as well as very early-time inflationary physics, and underpinned by gravitation through general relativity. There have always been open questions about the soundness of the foundations of the standard model. However, recent years have shown that there may also be questions from the observational sector with the emergence of differences between certain cosmological probes. In this White Paper, we identify the key objectives that need to be addressed over the coming decade together with the core science projects that aim to meet these challenges. These discordances primarily rest on the divergence in the measurement of core cosmological parameters with varying levels of statistical confidence. These possible statistical tensions may be partially accounted for by systematics in various measurements or cosmological probes but there is also a growing indication of potential new physics beyond the standard model. After reviewing the principal probes used in the measurement of cosmological parameters, as well as potential systematics, we discuss the most promising array of potential new physics that may be observable in upcoming surveys. We also discuss the growing set of novel data analysis approaches that go beyond traditional methods to test physical models. [Abridged]
△ Less
Submitted 15 May, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
Information Retrieval for Climate Impact
Authors:
Maarten de Rijke,
Bart van den Hurk,
Flora Salim,
Alaa Al Khourdajie,
Nan Bai,
Renato Calzone,
Declan Curran,
Getnet Demil,
Lesley Frew,
Noah Gießing,
Mukesh Kumar Gupta,
Maria Heuss,
Sanaa Hobeichi,
David Huard,
Jingwei Kang,
Ana Lucic,
Tanwi Mallick,
Shruti Nath,
Andrew Okem,
Barbara Pernici,
Thilina Rajapakse,
Hira Saleem,
Harry Scells,
Nicole Schneider,
Damiano Spina
, et al. (6 additional authors not shown)
Abstract:
The purpose of the MANILA24 Workshop on information retrieval for climate impact was to bring together researchers from academia, industry, governments, and NGOs to identify and discuss core research problems in information retrieval to assess climate change impacts. The workshop aimed to foster collaboration by bringing communities together that have so far not been very well connected -- informa…
▽ More
The purpose of the MANILA24 Workshop on information retrieval for climate impact was to bring together researchers from academia, industry, governments, and NGOs to identify and discuss core research problems in information retrieval to assess climate change impacts. The workshop aimed to foster collaboration by bringing communities together that have so far not been very well connected -- information retrieval, natural language processing, systematic reviews, impact assessments, and climate science. The workshop brought together a diverse set of researchers and practitioners interested in contributing to the development of a technical research agenda for information retrieval to assess climate change impacts.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Hidden Zeros of the Cosmological Wavefunction
Authors:
Shounak De,
Shruti Paranjape,
Andrzej Pokraka,
Marcus Spradlin,
Anastasia Volovich
Abstract:
Motivated by the recent discovery of hidden zeros in particle and string amplitudes, we characterize zeros of individual graph contributions to the cosmological wavefunction of a scalar field theory. We demonstrate that these contributions split near these zeros for all tree graphs and provide evidence that this extends to loop graphs as well. We explicitly construct polytopal realizations of the…
▽ More
Motivated by the recent discovery of hidden zeros in particle and string amplitudes, we characterize zeros of individual graph contributions to the cosmological wavefunction of a scalar field theory. We demonstrate that these contributions split near these zeros for all tree graphs and provide evidence that this extends to loop graphs as well. We explicitly construct polytopal realizations of the relevant graph associahedra and show that the cosmological zeros have natural geometric and physical interpretations. As a byproduct, we establish an equivalence between the wavefunction coefficients of chain graphs and flat-space Tr$(φ^3)$ amplitudes, enabling us to leverage the cosmological zeros to uncover the recently discovered hidden zeros of colored amplitudes.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Gemma 3 Technical Report
Authors:
Gemma Team,
Aishwarya Kamath,
Johan Ferret,
Shreya Pathak,
Nino Vieillard,
Ramona Merhej,
Sarah Perrin,
Tatiana Matejovicova,
Alexandre Ramé,
Morgane Rivière,
Louis Rouillard,
Thomas Mesnard,
Geoffrey Cideron,
Jean-bastien Grill,
Sabela Ramos,
Edouard Yvinec,
Michelle Casbon,
Etienne Pot,
Ivo Penchev,
Gaël Liu,
Francesco Visin,
Kathleen Kenealy,
Lucas Beyer,
Xiaohai Zhai,
Anton Tsitsulin
, et al. (191 additional authors not shown)
Abstract:
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie…
▽ More
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
Climatic Phase Transitions Unravel the Onset and Withdrawal of Indian Monsoon
Authors:
Yogenraj Patil,
Gaurav Chopra,
Shruti Tandon,
B. N. Goswami,
R. I. Sujith
Abstract:
The livelihood and food security of more than a billion people depend on the Indian monsoon (IM). Yet, a universal definition of the large-scale season and progress of IM is missing. Even though IM is a planetary-scale convectively coupled system arising largely from seasonal migration of the Intertropical Convergence Zone (ITCZ), the definitions of its onset and progression are based on local wea…
▽ More
The livelihood and food security of more than a billion people depend on the Indian monsoon (IM). Yet, a universal definition of the large-scale season and progress of IM is missing. Even though IM is a planetary-scale convectively coupled system arising largely from seasonal migration of the Intertropical Convergence Zone (ITCZ), the definitions of its onset and progression are based on local weather observations, making them practically inutile due to the detection of bogus onsets. Using climate networks, we show that small-scale clusters of locally defined rainfall onsets coalesce through two abrupt climatic phase transitions defining large-scale monsoon onsets over Northeast India and the Indian peninsula, respectively. These abrupt transitions are interspersed with continuous growth of clusters. Breaking the conventional wisdom that IM starts from southern peninsula and expands northward and westward, we unveil that IM starts from Northeast India and expands westward and northward, covering the entire country. We show that the large-scale monsoon onset over the Indian peninsula is critically dependent on the characteristics of monsoon onset over Northeast India. Unlike existing definitions, a rapid and consistent northward propagation of rainfall establishing the ITCZ manifests after our network-based onset dates. Thus, our definition captures the IM onset better than the existing definitions.
△ Less
Submitted 20 March, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Your Text Encoder Can Be An Object-Level Watermarking Controller
Authors:
Naresh Kumar Devulapally,
Mingzhen Huang,
Vishal Asnani,
Shruti Agarwal,
Siwei Lyu,
Vishnu Suresh Lokhande
Abstract:
Invisible watermarking of AI-generated images can help with copyright protection, enabling detection and identification of AI-generated media. In this work, we present a novel approach to watermark images of T2I Latent Diffusion Models (LDMs). By only fine-tuning text token embeddings $W_*$, we enable watermarking in selected objects or parts of the image, offering greater flexibility compared to…
▽ More
Invisible watermarking of AI-generated images can help with copyright protection, enabling detection and identification of AI-generated media. In this work, we present a novel approach to watermark images of T2I Latent Diffusion Models (LDMs). By only fine-tuning text token embeddings $W_*$, we enable watermarking in selected objects or parts of the image, offering greater flexibility compared to traditional full-image watermarking. Our method leverages the text encoder's compatibility across various LDMs, allowing plug-and-play integration for different LDMs. Moreover, introducing the watermark early in the encoding stage improves robustness to adversarial perturbations in later stages of the pipeline. Our approach achieves $99\%$ bit accuracy ($48$ bits) with a $10^5 \times$ reduction in model parameters, enabling efficient watermarking.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
Authors:
Omer Goldman,
Uri Shaham,
Dan Malkin,
Sivan Eiger,
Avinatan Hassidim,
Yossi Matias,
Joshua Maynez,
Adi Mayrav Gilady,
Jason Riesa,
Shruti Rijhwani,
Laura Rimell,
Idan Szpektor,
Reut Tsarfaty,
Matan Eyal
Abstract:
To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs' capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset that Evaluates Cross-Lingual K…
▽ More
To achieve equitable performance across languages, multilingual large language models (LLMs) must be able to abstract knowledge beyond the language in which it was acquired. However, the current literature lacks reliable ways to measure LLMs' capability of cross-lingual knowledge transfer. To that end, we present ECLeKTic, a multilingual closed-book QA (CBQA) dataset that Evaluates Cross-Lingual Knowledge Transfer in a simple, black-box manner. We detected information with uneven coverage across languages by controlling for presence and absence of Wikipedia articles in 12 languages. We generated knowledge-seeking questions in a source language, for which the answer appears in a relevant Wikipedia article and translated them to all other 11 languages, for which the respective Wikipedias lack equivalent articles. Assuming that Wikipedia reflects the prominent knowledge in the LLM's training data, to solve ECLeKTic's CBQA task the model is required to transfer knowledge between languages. Experimenting with 8 LLMs, we show that SOTA models struggle to effectively share knowledge across, languages even if they can predict the answer well for queries in the same language the knowledge was acquired in.
△ Less
Submitted 3 March, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Bosonisation and BTZ Black Hole Microstates
Authors:
Suvankar Dutta,
Shruti Menon,
Aayush Srivastav
Abstract:
When the boundary dynamics of \(AdS_3\) gravity is governed by the collective field theory Hamiltonian proposed by Jevicki and Sakita, its asymptotic symmetry algebra becomes the centerless \(U(1)\) Kac-Moody algebra. We quantize this system using the quantum bosonization of relativistic free fermions and relate these to the dynamical fields of \(AdS_3\) gravity. This leads to a correspondence whe…
▽ More
When the boundary dynamics of \(AdS_3\) gravity is governed by the collective field theory Hamiltonian proposed by Jevicki and Sakita, its asymptotic symmetry algebra becomes the centerless \(U(1)\) Kac-Moody algebra. We quantize this system using the quantum bosonization of relativistic free fermions and relate these to the dynamical fields of \(AdS_3\) gravity. This leads to a correspondence where different bulk configurations correspond to distinct states (particle-hole pair excitations) in the fermionic Hilbert space. This mapping allows us to construct BTZ black hole microstates, represented by Young diagrams of irreducible \(U(\infty)\) representations. Notably, the logarithm of the microstate degeneracy exactly reproduces the classical entropy of the BTZ black hole.
△ Less
Submitted 20 March, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text
Authors:
Matthieu Meeus,
Lukas Wutschitz,
Santiago Zanella-Béguelin,
Shruti Tople,
Reza Shokri
Abstract:
How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target th…
▽ More
How much information about training samples can be leaked through synthetic data generated by Large Language Models (LLMs)? Overlooking the subtleties of information flow in synthetic data generation pipelines can lead to a false sense of privacy. In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target the training data used to fine-tune the LLM that is then used to synthesize data. The significant performance of our MIA shows that synthetic data leak information about the training data. Further, we find that canaries crafted for model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released. Such out-of-distribution canaries have limited influence on the model's output when prompted to generate useful, in-distribution synthetic data, which drastically reduces their effectiveness. To tackle this problem, we leverage the mechanics of auto-regressive models to design canaries with an in-distribution prefix and a high-perplexity suffix that leave detectable traces in synthetic data. This enhances the power of data-based MIAs and provides a better assessment of the privacy risks of releasing synthetic data generated by LLMs.
△ Less
Submitted 6 June, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
Authors:
Daniel Deutsch,
Eleftheria Briakou,
Isaac Caswell,
Mara Finkelstein,
Rebecca Galor,
Juraj Juraska,
Geza Kovacs,
Alison Lui,
Ricardo Rei,
Jason Riesa,
Shruti Rijhwani,
Parker Riley,
Elizabeth Salesky,
Firas Trabelsi,
Stephanie Winkler,
Biao Zhang,
Markus Freitag
Abstract:
As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in…
▽ More
As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, including on tasks like machine translation (MT). In this work, we extend the WMT24 dataset to cover 55 languages by collecting new human-written references and post-edits for 46 new languages and dialects in addition to post-edits of the references in 8 out of 9 languages in the original WMT24 dataset. The dataset covers four domains: literary, news, social, and speech. We benchmark a variety of MT providers and LLMs on the collected dataset using automatic metrics and find that LLMs are the best-performing MT systems in all 55 languages. These results should be confirmed using a human-based evaluation, which we leave for future work.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
Authors:
Shruti Joshi,
Andrea Dittadi,
Sébastien Lachapelle,
Dhanya Sridhar
Abstract:
Steering methods manipulate the representations of large language models (LLMs) to induce responses that have desired properties, e.g., truthfulness, offering a promising approach for LLM alignment without the need for fine-tuning. Traditionally, steering has relied on supervision, such as from contrastive pairs of prompts that vary in a single target concept, which is costly to obtain and limits…
▽ More
Steering methods manipulate the representations of large language models (LLMs) to induce responses that have desired properties, e.g., truthfulness, offering a promising approach for LLM alignment without the need for fine-tuning. Traditionally, steering has relied on supervision, such as from contrastive pairs of prompts that vary in a single target concept, which is costly to obtain and limits the speed of steering research. An appealing alternative is to use unsupervised approaches such as sparse autoencoders (SAEs) to map LLM embeddings to sparse representations that capture human-interpretable concepts. However, without further assumptions, SAEs may not be identifiable: they could learn latent dimensions that entangle multiple concepts, leading to unintentional steering of unrelated properties. We introduce Sparse Shift Autoencoders (SSAEs) that instead map the differences between embeddings to sparse representations. Crucially, we show that SSAEs are identifiable from paired observations that vary in \textit{multiple unknown concepts}, leading to accurate steering of single concepts without the need for supervision. We empirically demonstrate accurate steering across semi-synthetic and real-world language datasets using Llama-3.1 embeddings.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Machine Learning-Driven Volumetric Cloud Rendering: Procedural Shader Optimization and Dynamic Lighting in Unreal Engine for Realistic Atmospheric Simulation
Authors:
Shruti Singh,
Shantanu Kumar
Abstract:
This study advances real-time volumetric cloud rendering in Computer Graphics (CG) by developing a specialized shader in Unreal Engine (UE), focusing on realistic cloud modeling and lighting. By leveraging ray-casting-based lighting algorithms, this work demonstrates the practical application of a dual-layered procedural noise model, eliminating the need for conventional two-dimensional (2D) weath…
▽ More
This study advances real-time volumetric cloud rendering in Computer Graphics (CG) by developing a specialized shader in Unreal Engine (UE), focusing on realistic cloud modeling and lighting. By leveraging ray-casting-based lighting algorithms, this work demonstrates the practical application of a dual-layered procedural noise model, eliminating the need for conventional two-dimensional (2D) weather textures. The shader allows for procedurally configured skies with a defined parameter set, offering flexibility for both artistic expression and realistic simulation. Empirical results reveal that the shader achieves an average rendering time of 35ms per frame while maintaining high visual accuracy and scene realism. Visual fidelity assessments indicate a 15% improvement in cloud realism over traditional 2D techniques, particularly in dynamic lighting scenarios. This research contributes to CG by bridging technical and aesthetic elements, enhancing real-time visual storytelling and immersion within gigital media environments.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Efficient Distributed Training through Gradient Compression with Sparsification and Quantization Techniques
Authors:
Shruti Singh,
Shantanu Kumar
Abstract:
This study investigates the impact of gradient compression on distributed training performance, focusing on sparsification and quantization techniques, including top-k, DGC, and QSGD. In baseline experiments, random-k compression results in severe performance degradation, highlighting its inefficacy. In contrast, using top-k and DGC at 50 times compression yields performance improvements, reducing…
▽ More
This study investigates the impact of gradient compression on distributed training performance, focusing on sparsification and quantization techniques, including top-k, DGC, and QSGD. In baseline experiments, random-k compression results in severe performance degradation, highlighting its inefficacy. In contrast, using top-k and DGC at 50 times compression yields performance improvements, reducing perplexity by up to 0.06 compared to baseline. Experiments across 1, 2, and 4 workers demonstrate that conservative sparsification can have a regularizing effect, especially for smaller models, while compression ratios above 5000 times impair performance, particularly for DGC. Communication times are reduced across all compression methods, with top-k and DGC decreasing communication to negligible levels at high compression ratios. However, increased computation times offset this efficiency for top-k due to sorting demands, making it less scalable than DGC or QSGD. In convergence tests, sparsification techniques show accelerated convergence, requiring fewer epochs than the baseline, which has implications for computational savings. Although precision trade-offs emerge, floating point errors are mitigated by compression. This study's findings underscore the need to tune hyperparameters specifically for each compression technique to achieve optimal model performance, especially in distributed training systems.
△ Less
Submitted 7 December, 2024;
originally announced February 2025.
-
A Continuous Pump-Probe Experiment to Observe Rydberg Wave Packet Dynamics
Authors:
Kevin L. Romans,
Kyle Foster,
Shruti Majumdar,
Bishnu P. Acharya,
Onyx Russ,
A. H. N. C. De Silva,
Daniel Fischer
Abstract:
Rydberg atoms remain in the limelight due to their applications in quantum optics and information technologies. In this work, the dynamics of Rydberg atoms stored in a momentum spectrometer by an all-optical trap is studied by ionizing them in the field of a continuous wave optical dipole trap. While the addition of the optical dipole trap allows to further cool the atoms, it comes at the expense…
▽ More
Rydberg atoms remain in the limelight due to their applications in quantum optics and information technologies. In this work, the dynamics of Rydberg atoms stored in a momentum spectrometer by an all-optical trap is studied by ionizing them in the field of a continuous wave optical dipole trap. While the addition of the optical dipole trap allows to further cool the atoms, it comes at the expense of the time of flight information which is required to retrieve photoelectron momentum distributions. Here, we report on a method that extends the standard COLTRIMS (cold target recoil ion momentum spectroscopy) technique, including continuous wave lasers in pump-probe schemes, by utilizing coincidence measurements. In particular, the photoionization of atomic $^6$Li initially in a spin-polarized $2^{2}P_{3/2}$ state is explored. Multi-photon excitation from a tunable-mode femtosecond pulse is exploited to produce Rydberg atoms, which can then be ionized by the dipole field. The resulting ionization rate becomes explicitly time-dependent, and analyzing its structure unlocks the real-time atomic dynamics on nanosecond time scales.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
LR0.FM: Low-Res Benchmark and Improving Robustness for Zero-Shot Classification in Foundation Models
Authors:
Priyank Pathak,
Shyam Marjit,
Shruti Vyas,
Yogesh S Rawat
Abstract:
Visual-language foundation Models (FMs) exhibit remarkable zero-shot generalization across diverse tasks, largely attributed to extensive pre-training on largescale datasets. However, their robustness on low-resolution/pixelated (LR) images, a common challenge in real-world scenarios, remains underexplored. We introduce LR0.FM, a comprehensive benchmark evaluating the impact of low resolution on t…
▽ More
Visual-language foundation Models (FMs) exhibit remarkable zero-shot generalization across diverse tasks, largely attributed to extensive pre-training on largescale datasets. However, their robustness on low-resolution/pixelated (LR) images, a common challenge in real-world scenarios, remains underexplored. We introduce LR0.FM, a comprehensive benchmark evaluating the impact of low resolution on the zero-shot classification performance of 10 FM(s) across 66 backbones and 15 datasets. We propose a novel metric, Weighted Aggregated Robustness, to address the limitations of existing metrics and better evaluate model performance across resolutions and datasets. Our key findings show that: (i) model size positively correlates with robustness to resolution degradation, (ii) pre-training dataset quality is more important than its size, and (iii) fine-tuned and higher resolution models are less robust against LR. Our analysis further reveals that the model makes semantically reasonable predictions at LR, and the lack of fine-grained details in input adversely impacts the model's initial layers more than the deeper layers. We use these insights and introduce a simple strategy, LR-TK0, to enhance the robustness of models without compromising their pre-trained weights. We demonstrate the effectiveness of LR-TK0 for robustness against low-resolution across several datasets and its generalization capability across backbones and other approaches. Code is available at https://github.com/shyammarjit/LR0.FM
△ Less
Submitted 18 May, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor
Authors:
Fausto Mauricio Lagos Suarez,
Akshit Saradagi,
Vidya Sumathy,
Shruti Kotpaliwar,
George Nikolakopoulos
Abstract:
This article introduces a curriculum learning approach to develop a reinforcement learning-based robust stabilizing controller for a Quadrotor that meets predefined performance criteria. The learning objective is to achieve desired positions from random initial conditions while adhering to both transient and steady-state performance specifications. This objective is challenging for conventional on…
▽ More
This article introduces a curriculum learning approach to develop a reinforcement learning-based robust stabilizing controller for a Quadrotor that meets predefined performance criteria. The learning objective is to achieve desired positions from random initial conditions while adhering to both transient and steady-state performance specifications. This objective is challenging for conventional one-stage end-to-end reinforcement learning, due to the strong coupling between position and orientation dynamics, the complexity in designing and tuning the reward function, and poor sample efficiency, which necessitates substantial computational resources and leads to extended convergence times. To address these challenges, this work decomposes the learning objective into a three-stage curriculum that incrementally increases task complexity. The curriculum begins with learning to achieve stable hovering from a fixed initial condition, followed by progressively introducing randomization in initial positions, orientations and velocities. A novel additive reward function is proposed, to incorporate transient and steady-state performance specifications. The results demonstrate that the Proximal Policy Optimization (PPO)-based curriculum learning approach, coupled with the proposed reward structure, achieves superior performance compared to a single-stage PPO-trained policy with the same reward function, while significantly reducing computational resource requirements and convergence time. The curriculum-trained policy's performance and robustness are thoroughly validated under random initial conditions and in the presence of disturbances.
△ Less
Submitted 17 April, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
On the Coexistence and Ensembling of Watermarks
Authors:
Aleksandar Petrov,
Shruti Agarwal,
Philip H. S. Torr,
Adel Bibi,
John Collomosse
Abstract:
Watermarking, the practice of embedding imperceptible information into media such as images, videos, audio, and text, is essential for intellectual property protection, content provenance and attribution. The growing complexity of digital ecosystems necessitates watermarks for different uses to be embedded in the same media. However, to detect and decode all watermarks, they need to coexist well w…
▽ More
Watermarking, the practice of embedding imperceptible information into media such as images, videos, audio, and text, is essential for intellectual property protection, content provenance and attribution. The growing complexity of digital ecosystems necessitates watermarks for different uses to be embedded in the same media. However, to detect and decode all watermarks, they need to coexist well with one another. We perform the first study of coexistence of deep image watermarking methods and, contrary to intuition, we find that various open-source watermarks can coexist with only minor impacts on image quality and decoding robustness. The coexistence of watermarks also opens the avenue for ensembling watermarking methods. We show how ensembling can increase the overall message capacity and enable new trade-offs between capacity, accuracy, robustness and image quality, without needing to retrain the base models.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Characterization of Entanglement in Higher Dimensional Bipartite as well as Multipartite Quantum System and its Application
Authors:
Shruti Aggarwal
Abstract:
In recent years considerable progress has been made towards developing a general theory of quantum entanglement. In particular, criteria to decide whether a given quantum state is entangled are of high theoretical and practical interest. This problem is additionally complicated by the existence of bound entanglement, which are weak entangled states and hard to detect. In this thesis, we have worke…
▽ More
In recent years considerable progress has been made towards developing a general theory of quantum entanglement. In particular, criteria to decide whether a given quantum state is entangled are of high theoretical and practical interest. This problem is additionally complicated by the existence of bound entanglement, which are weak entangled states and hard to detect. In this thesis, we have worked on the characterization of bipartite and tripartite entanglement. We have established a few separability criteria that successfully detect Negative Partial Transpose (NPT) as well as Positive Partial Transpose (PPT) entangled states. Although the topic of detection of entanglement has been extensively studied in the literature through many approaches, the majority of these criteria are not physically realizable. This means that they are well accepted in the mathematical language but cannot be implemented in a laboratory setting. In this thesis, we propose some theoretical ideas to realize these entanglement detection criteria experimentally.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
MERCURY: A fast and versatile multi-resolution based global emulator of compound climate hazards
Authors:
Shruti Nath,
Julie Carreau,
Kai Kornhuber,
Peter Pfleiderer,
Carl-Friedrich Schleussner,
Philippe Naveau
Abstract:
High-impact climate damages are often driven by compounding climate conditions. For example, elevated heat stress conditions can arise from a combination of high humidity and temperature. To explore future changes in compounding hazards under a range of climate scenarios and with large ensembles, climate emulators can provide light-weight, data-driven complements to Earth System Models. Yet, only…
▽ More
High-impact climate damages are often driven by compounding climate conditions. For example, elevated heat stress conditions can arise from a combination of high humidity and temperature. To explore future changes in compounding hazards under a range of climate scenarios and with large ensembles, climate emulators can provide light-weight, data-driven complements to Earth System Models. Yet, only a few existing emulators can jointly emulate multiple climate variables. In this study, we present the Multi-resolution EmulatoR for CompoUnd climate Risk analYsis: MERCURY. MERCURY extends multi-resolution analysis to a spatio-temporal framework for versatile emulation of multiple variables. MERCURY leverages data-driven, image compression techniques to generate emulations in a memory-efficient manner. MERCURY consists of a regional component that represents the monthly, regional response of a given variable to yearly Global Mean Temperature (GMT) using a probabilistic regression based additive model, resolving regional cross-correlations. It then adapts a reverse lifting-scheme operator to jointly spatially disaggregate regional, monthly values to grid-cell level. We demonstrate MERCURY's capabilities on representing the humid-heat metric, Wet Bulb Globe Temperature, as derived from temperature and relative humidity emulations. The emulated WBGT spatial correlations correspond well to those of ESMs and the 95% and 97.5% quantiles of WBGT distributions are well captured, with an average of 5% deviation. MERCURY's setup allows for region-specific emulations from which one can efficiently "zoom" into the grid-cell level across multiple variables by means of the reverse lifting-scheme operator. This circumvents the traditional problem of having to emulate complete, global-fields of climate data and resulting storage requirements.
△ Less
Submitted 23 December, 2024;
originally announced January 2025.
-
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Authors:
Vriksha Srihari,
R. Bhavya,
Shruti Jayaraman,
V. Mary Anita Rajam
Abstract:
While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic…
▽ More
While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text.
Carried out in three main stages, we start with a base text-to-image model that captures context from an input text. We then employ Stable Diffusion on the rudimentary image produced, to generate frames with enhanced realism and overall quality. These frames are processed with depth estimation algorithms to create left-eye and right-eye views, which are stitched side-by-side to create an immersive viewing experience. Such systems would be highly beneficial in virtual reality production, since filming and scene building often require extensive hours of work and post-production effort.
We utilize image evaluation techniques, specifically Fréchet Inception Distance and CLIP Score, to assess the visual quality of frames produced for the video. These quantitative measures establish the proficiency of the proposed method.
Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations.
△ Less
Submitted 10 March, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
Unraveling the switching dynamics in a quantum double-well potential
Authors:
Qile Su,
Rodrigo G. Cortiñas,
Jayameenakshi Venkatraman,
Shruti Puri
Abstract:
The spontaneous switching of a quantum particle between the wells of a double-well potential is a phenomenon of general interest to physics and chemistry. It was broadly believed that the switching rate decreases steadily with the size of the energy barrier. This view was challenged by a recent experiment on a driven superconducting Kerr nonlinear oscillator (often called the Kerr-cat qubit or the…
▽ More
The spontaneous switching of a quantum particle between the wells of a double-well potential is a phenomenon of general interest to physics and chemistry. It was broadly believed that the switching rate decreases steadily with the size of the energy barrier. This view was challenged by a recent experiment on a driven superconducting Kerr nonlinear oscillator (often called the Kerr-cat qubit or the Kerr parametric oscillator), whose energy barrier can be increased by ramping up the drive. Remarkably, as the drive amplitude increases, the switching rate exhibits a step-like decrease termed the "staircase". The view challenged by the experiment demands a deep review of our understanding of quantum effects in double wells. In this work, we derive a semi-analytical formula for the switching rate that resolves a continuous transition between tunneling- and dissipation-dominated dynamics. These two dynamics are observed respectively in the flat and the steep parts of each step in the staircase. Our formula exposes two distinct dissipative processes that limit tunneling: dephasing and decay. This allows us to predict the critical drive amplitudes where steps occur. In addition, we show that in the regime of a few states in the well and under moderate to low temperatures, highly excited states are populated predominantly via cascaded and direct thermal heating rather than quantum heating. At very low temperatures, however, the perturbation induced by the nonhermitian Hamiltonian becomes important and facilitates a new form of quantum heating. We numerically map the activation mechanism as a function of drive amplitude, damping rate, and temperature. Our theory deepens the understanding of switching dynamics between metastable quantum states, highlights the importance of a general interplay between tunneling and dissipation, and identifies a novel quantum regime in activated transitions.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
WavePulse: Real-time Content Analytics of Radio Livestreams
Authors:
Govind Mittal,
Sarthak Gupta,
Shruti Wagle,
Chirag Chopra,
Anthony J DeMattee,
Nasir Memon,
Mustaque Ahamad,
Chinmay Hegde
Abstract:
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally…
▽ More
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
△ Less
Submitted 29 January, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Uniqueness of MHV Gravity Amplitudes
Authors:
Joris Koefler,
Umut Oktem,
Shruti Paranjape,
Jaroslav Trnka,
Bailee Zacovic
Abstract:
We investigate MHV tree-level gravity amplitudes as defined on the spinor-helicity variety. Unlike their gluon counterparts, the gravity amplitudes do not have logarithmic singularities and do not admit Amplituhedron-like construction. Importantly, they are not determined just by their singularities, but rather their numerators have interesting zeroes. We make a conjecture about the uniqueness of…
▽ More
We investigate MHV tree-level gravity amplitudes as defined on the spinor-helicity variety. Unlike their gluon counterparts, the gravity amplitudes do not have logarithmic singularities and do not admit Amplituhedron-like construction. Importantly, they are not determined just by their singularities, but rather their numerators have interesting zeroes. We make a conjecture about the uniqueness of the numerator and explore this feature from a more mathematical perspective. This leads us to a new approach for examining adjoints. We outline steps of our proposed proof and provide computational evidence for its validity in specific cases.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond
Authors:
MD Raqib Khan,
Anshul Negi,
Ashutosh Kulkarni,
Shruti S. Phutke,
Santosh Kumar Vipparthi,
Subrahmanyam Murala
Abstract:
Quality degradation is observed in underwater images due to the effects of light refraction and absorption by water, leading to issues like color cast, haziness, and limited visibility. This degradation negatively affects the performance of autonomous underwater vehicles used in marine applications. To address these challenges, we propose a lightweight phase-based transformer network with 1.77M pa…
▽ More
Quality degradation is observed in underwater images due to the effects of light refraction and absorption by water, leading to issues like color cast, haziness, and limited visibility. This degradation negatively affects the performance of autonomous underwater vehicles used in marine applications. To address these challenges, we propose a lightweight phase-based transformer network with 1.77M parameters for underwater image restoration (UIR). Our approach focuses on effectively extracting non-contaminated features using a phase-based self-attention mechanism. We also introduce an optimized phase attention block to restore structural information by propagating prominent attentive features from the input. We evaluate our method on both synthetic (UIEB, UFO-120) and real-world (UIEB, U45, UCCS, SQUID) underwater image datasets. Additionally, we demonstrate its effectiveness for low-light image enhancement using the LOL dataset. Through extensive ablation studies and comparative analysis, it is clear that the proposed approach outperforms existing state-of-the-art (SOTA) methods.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Planning Shorter Paths in Graphs of Convex Sets by Undistorting Parametrized Configuration Spaces
Authors:
Shruti Garg,
Thomas Cohn,
Russ Tedrake
Abstract:
Optimization based motion planning provides a useful modeling framework through various costs and constraints. Using Graph of Convex Sets (GCS) for trajectory optimization gives guarantees of feasibility and optimality by representing configuration space as the finite union of convex sets. Nonlinear parametrizations can be used to extend this technique to handle cases such as kinematic loops, but…
▽ More
Optimization based motion planning provides a useful modeling framework through various costs and constraints. Using Graph of Convex Sets (GCS) for trajectory optimization gives guarantees of feasibility and optimality by representing configuration space as the finite union of convex sets. Nonlinear parametrizations can be used to extend this technique to handle cases such as kinematic loops, but this distorts distances, such that solving with convex objectives will yield paths that are suboptimal in the original space. We present a method to extend GCS to nonconvex objectives, allowing us to "undistort" the optimization landscape while maintaining feasibility guarantees. We demonstrate our method's efficacy on three different robotic planning domains: a bimanual robot moving an object with both arms, the set of 3D rotations using Euler angles, and a rational parametrization of kinematics that enables certifying regions as collision free. Across the board, our method significantly improves path length and trajectory duration with only a minimal increase in runtime. Website: https://shrutigarg914.github.io/pgd-gcs-results/
△ Less
Submitted 13 April, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
On Importance of Code-Mixed Embeddings for Hate Speech Identification
Authors:
Shruti Jagdale,
Omkar Khade,
Gauri Takalikar,
Mihir Inamdar,
Raviraj Joshi
Abstract:
Code-mixing is the practice of using two or more languages in a single sentence, which often occurs in multilingual communities such as India where people commonly speak multiple languages. Classic NLP tools, trained on monolingual data, face challenges when dealing with code-mixed data. Extracting meaningful information from sentences containing multiple languages becomes difficult, particularly…
▽ More
Code-mixing is the practice of using two or more languages in a single sentence, which often occurs in multilingual communities such as India where people commonly speak multiple languages. Classic NLP tools, trained on monolingual data, face challenges when dealing with code-mixed data. Extracting meaningful information from sentences containing multiple languages becomes difficult, particularly in tasks like hate speech detection, due to linguistic variation, cultural nuances, and data sparsity. To address this, we aim to analyze the significance of code-mixed embeddings and evaluate the performance of BERT and HingBERT models (trained on a Hindi-English corpus) in hate speech detection. Our study demonstrates that HingBERT models, benefiting from training on the extensive Hindi-English dataset L3Cube-HingCorpus, outperform BERT models when tested on hate speech text datasets. We also found that code-mixed Hing-FastText performs better than standard English FastText and vanilla BERT models.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Challenges in Adapting Multilingual LLMs to Low-Resource Languages using LoRA PEFT Tuning
Authors:
Omkar Khade,
Shruti Jagdale,
Abhishek Phaltankar,
Gauri Takalikar,
Raviraj Joshi
Abstract:
Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, yet challenges persist in adapting these models for low-resource languages. In this study, we investigate the effects of Low-Rank Adaptation (LoRA) Parameter-Efficient Fine-Tuning (PEFT) on multilingual Gemma models for Marathi, a language with limited resources. Using a translated Alpaca dataset with 52,000 instr…
▽ More
Large Language Models (LLMs) have demonstrated remarkable multilingual capabilities, yet challenges persist in adapting these models for low-resource languages. In this study, we investigate the effects of Low-Rank Adaptation (LoRA) Parameter-Efficient Fine-Tuning (PEFT) on multilingual Gemma models for Marathi, a language with limited resources. Using a translated Alpaca dataset with 52,000 instruction-response pairs, our findings reveal that while evaluation metrics often show a performance decline post-fine-tuning, manual assessments frequently suggest that the fine-tuned models outperform their original counterparts. The observations indicate improvements in target language generation capabilities but a reduction in reasoning abilities following language adaptation. These results underscore the need for improved evaluation methodologies and the creation of high-quality native datasets to accurately assess language-specific model performance in low-resource settings.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Faulty towers: recovering a functioning quantum random access memory in the presence of defective routers
Authors:
D. K. Weiss,
Shifan Xu,
Shruti Puri,
Yongshan Ding,
S. M. Girvin
Abstract:
Proposals for quantum random access memory (QRAM) generally have a binary-tree structure, and thus require hardware that is exponential in the depth of the QRAM. For solid-state based devices, a fabrication yield that is less than $100\%$ implies that certain addresses at the bottom of the tree become inaccessible if a router in the unique path to that address is faulty. We discuss how to recover…
▽ More
Proposals for quantum random access memory (QRAM) generally have a binary-tree structure, and thus require hardware that is exponential in the depth of the QRAM. For solid-state based devices, a fabrication yield that is less than $100\%$ implies that certain addresses at the bottom of the tree become inaccessible if a router in the unique path to that address is faulty. We discuss how to recover a functioning QRAM in the presence of faulty routers. We present the \texttt{IterativeRepair} algorithm, which constructs QRAMs layer by layer until the desired depth is reached. This algorithm utilizes ancilla flag qubits which reroute queries to faulty routers. We present a classical algorithm \texttt{FlagQubitMinimization} that attempts to minimize the required number of such ancilla. For a router failure rate of $1\%$ and a QRAM of depth $n=13$, we expect that on average 430 addresses need repair: we require only 1.5 ancilla flag qubits on average to perform this rerouting.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model
Authors:
Christopher Nguyen,
William Nguyen,
Atsushi Suzuki,
Daisuke Oku,
Hong An Phan,
Sang Dinh,
Zooey Nguyen,
Anh Ha,
Shruti Raghavan,
Huy Vo,
Thang Nguyen,
Lan Nguyen,
Yoshikuni Hirayama
Abstract:
Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry of semiconductor devices and processes. SemiKong, the first industry-specific LLM for the semicondu…
▽ More
Large Language Models (LLMs) have demonstrated the potential to address some issues within the semiconductor industry. However, they are often general-purpose models that lack the specialized knowledge needed to tackle the unique challenges of this sector, such as the intricate physics and chemistry of semiconductor devices and processes. SemiKong, the first industry-specific LLM for the semiconductor domain, provides a foundation that can be used to develop tailored proprietary models. With SemiKong 1.0, we aim to develop a foundational model capable of understanding etching problems at an expert level. Our key contributions include (a) curating a comprehensive corpus of semiconductor-related texts, (b) creating a foundational model with in-depth semiconductor knowledge, and (c) introducing a framework for integrating expert knowledge, thereby advancing the evaluation process of domain-specific AI models. Through fine-tuning a pre-trained LLM using our curated dataset, we have shown that SemiKong outperforms larger, general-purpose LLMs in various semiconductor manufacturing and design tasks. Our extensive experiments underscore the importance of developing domain-specific LLMs as a foundation for company- or tool-specific proprietary models, paving the way for further research and applications in the semiconductor domain. Code and dataset will be available at https://github.com/aitomatic/semikong
△ Less
Submitted 21 November, 2024; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Automating Sonologists USG Commands with AI and Voice Interface
Authors:
Emad Mohamed,
Shruti Tiwari,
Sheena Christabel Pravin
Abstract:
This research presents an advanced AI-powered ultrasound imaging system that incorporates real-time image processing, organ tracking, and voice commands to enhance the efficiency and accuracy of diagnoses in clinical practice. Traditional ultrasound diagnostics often require significant time and introduce a degree of subjectivity due to user interaction. The goal of this innovative solution is to…
▽ More
This research presents an advanced AI-powered ultrasound imaging system that incorporates real-time image processing, organ tracking, and voice commands to enhance the efficiency and accuracy of diagnoses in clinical practice. Traditional ultrasound diagnostics often require significant time and introduce a degree of subjectivity due to user interaction. The goal of this innovative solution is to provide Sonologists with a more predictable and productive imaging procedure utilizing artificial intelligence, computer vision, and voice technology. The functionality of the system employs computer vision and deep learning algorithms, specifically adopting the Mask R-CNN model from Detectron2 for semantic segmentation of organs and key landmarks. This automation improves diagnostic accuracy by enabling the extraction of valuable information with minimal human input. Additionally, it includes a voice recognition feature that allows for hands-free operation, enabling users to control the system with commands such as freeze or liver, all while maintaining their focus on the patient. The architecture comprises video processing and real-time segmentation modules that prepare the system to perform essential imaging functions, such as freezing and zooming in on frames. The liver histopathology module, optimized for detecting fibrosis, achieved an impressive accuracy of 98.6%. Furthermore, the organ segmentation module produces output confidence levels between 50% and 95%, demonstrating its efficacy in organ detection.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Logarithmic corrections to entropy of 3D cosmological solutions from celestial dual
Authors:
Arindam Bhattacharjee,
Shruti Menon,
Muktajyoti Saha
Abstract:
Recently a one-dimensional Schwarzian type theory was proposed as an effective dual theory of pure gravity in (2+1) dimensional asymptotically flat spacetimes \cite{Bhattacharjee:2023sfd}. This codimension-two `celestial' dual captures the Bekenstein-Hawking entropy of bulk flat cosmologies in semiclassical limit. In this paper, we extend this analysis beyond semiclassical approximation and evalua…
▽ More
Recently a one-dimensional Schwarzian type theory was proposed as an effective dual theory of pure gravity in (2+1) dimensional asymptotically flat spacetimes \cite{Bhattacharjee:2023sfd}. This codimension-two `celestial' dual captures the Bekenstein-Hawking entropy of bulk flat cosmologies in semiclassical limit. In this paper, we extend this analysis beyond semiclassical approximation and evaluate the one-loop exact partition function of this celestial dual theory. Our analysis results in novel nontrivial logarithmic corrections to the area term of entropy, appearing from the one-loop path integral.
△ Less
Submitted 27 November, 2024; v1 submitted 8 November, 2024;
originally announced November 2024.
-
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Authors:
Shruti Singh,
Nandan Sarkar,
Arman Cohan
Abstract:
Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and ans…
▽ More
Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors, ensuring a thorough examination of the literature. We enhance the dataset's quality through a process that carefully filters out lower quality questions, decontextualizes the content, tracks the source document across different versions, and incorporates a bibliography for multi-document question-answering. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials, and require multi-document reasoning. We evaluate several open-source and proprietary LLMs across various configurations to explore their capabilities in generating relevant and factual responses. Our comprehensive evaluation, based on metrics for surface-level similarity and LLM judgements, highlights notable performance discrepancies. SciDQA represents a rigorously curated, naturally derived scientific QA dataset, designed to facilitate research on complex scientific text understanding.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Quantum optomechanical control of long-lived bulk acoustic phonons
Authors:
Hilel Hagai Diamandi,
Yizhi Luo,
David Mason,
Tevfik Bulent Kanmaz,
Sayan Ghosh,
Margaret Pavlovich,
Taekwan Yoon,
Ryan Behunin,
Shruti Puri,
Jack G. E. Harris,
Peter T. Rakich
Abstract:
High-fidelity quantum optomechanical control of a mechanical oscillator requires the ability to perform efficient, low-noise operations on long-lived phononic excitations. Microfabricated high-overtone bulk acoustic wave resonators ($\mathrmμ$HBARs) have been shown to support high-frequency (> 10 GHz) mechanical modes with exceptionally long coherence times (> 1.5 ms), making them a compelling res…
▽ More
High-fidelity quantum optomechanical control of a mechanical oscillator requires the ability to perform efficient, low-noise operations on long-lived phononic excitations. Microfabricated high-overtone bulk acoustic wave resonators ($\mathrmμ$HBARs) have been shown to support high-frequency (> 10 GHz) mechanical modes with exceptionally long coherence times (> 1.5 ms), making them a compelling resource for quantum optomechanical experiments. In this paper, we demonstrate a new optomechanical system that permits quantum optomechanical control of individual high-coherence phonon modes supported by such $\mathrmμ$HBARs for the first time. We use this system to perform laser cooling of such ultra-massive (7.5 $\mathrmμ$g) high frequency (12.6 GHz) phonon modes from an occupation of ${\sim}$22 to fewer than 0.4 phonons, corresponding to laser-based ground-state cooling of the most massive mechanical object to date. Through these laser cooling experiments, no absorption-induced heating is observed, demonstrating the resilience of the $\mathrmμ$HBAR against parasitic heating. The unique features of such $\mathrmμ$HBARs make them promising as the basis for a new class of quantum optomechanical systems that offer enhanced robustness to decoherence, necessary for efficient, low-noise photon-phonon conversion.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following
Authors:
Yun He,
Di Jin,
Chaoqi Wang,
Chloe Bi,
Karishma Mandyam,
Hejia Zhang,
Chen Zhu,
Ning Li,
Tengyu Xu,
Hongjiang Lv,
Shruti Bhosale,
Chenguang Zhu,
Karthik Abinav Sankararaman,
Eryk Helenowski,
Melanie Kambadur,
Aditya Tayade,
Hao Ma,
Han Fang,
Sinong Wang
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including instruction following, which is crucial for aligning model outputs with user expectations. However, evaluating LLMs' ability to follow instructions remains challenging due to the complexity and subjectivity of human language. Current benchmarks primarily focus on single-turn, monolingual instructions…
▽ More
Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including instruction following, which is crucial for aligning model outputs with user expectations. However, evaluating LLMs' ability to follow instructions remains challenging due to the complexity and subjectivity of human language. Current benchmarks primarily focus on single-turn, monolingual instructions, which do not adequately reflect the complexities of real-world applications that require handling multi-turn and multilingual interactions. To address this gap, we introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions. Multi-IF, which utilizes a hybrid framework combining LLM and human annotators, expands upon the IFEval by incorporating multi-turn sequences and translating the English prompts into another 7 languages, resulting in a dataset of 4,501 multilingual conversations, where each has three turns. Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks. All the models tested showed a higher rate of failure in executing instructions correctly with each additional turn. For example, o1-preview drops from 0.877 at the first turn to 0.707 at the third turn in terms of average accuracy over all languages. Moreover, languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models' multilingual capabilities. We release Multi-IF prompts and the evaluation code base to encourage further research in this critical area.
△ Less
Submitted 12 November, 2024; v1 submitted 20 October, 2024;
originally announced October 2024.