Search | arXiv e-print repository

arXiv:2507.08211 [pdf]

Effect of Static vs. Conversational AI-Generated Messages on Colorectal Cancer Screening Intent: a Randomized Controlled Trial

Authors: Neil K. R. Sehgal, Manuel Tonneau, Andy Tan, Shivan J. Mehta, Alison Buttenheim, Lyle Ungar, Anish K. Agarwal, Sharath Chandra Guntuku

Abstract: Large language model (LLM) chatbots show increasing promise in persuasive communication. Yet their real-world utility remains uncertain, particularly in clinical settings where sustained conversations are difficult to scale. In a pre-registered randomized controlled trial, we enrolled 915 U.S. adults (ages 45-75) who had never completed colorectal cancer (CRC) screening. Participants were randomiz… ▽ More Large language model (LLM) chatbots show increasing promise in persuasive communication. Yet their real-world utility remains uncertain, particularly in clinical settings where sustained conversations are difficult to scale. In a pre-registered randomized controlled trial, we enrolled 915 U.S. adults (ages 45-75) who had never completed colorectal cancer (CRC) screening. Participants were randomized to: (1) no message control, (2) expert-written patient materials, (3) single AI-generated message, or (4) a motivational interviewing chatbot. All participants were required to remain in their assigned condition for at least three minutes. Both AI arms tailored content using participant's self-reported demographics including age and gender. Both AI interventions significantly increased stool test intentions by over 12 points (12.9-13.8/100), compared to a 7.5 gain for expert materials (p<.001 for all comparisons). While the AI arms outperformed the no message control for colonoscopy intent, neither showed improvement xover expert materials. Notably, for both outcomes, the chatbot did not outperform the single AI message in boosting intent despite participants spending ~3.5 minutes more on average engaging with it. These findings suggest concise, demographically tailored AI messages may offer a more scalable and clinically viable path to health behavior change than more complex conversational agents and generic time intensive expert-written materials. Moreover, LLMs appear more persuasive for lesser-known and less-invasive screening approaches like stool testing, but may be less effective for entrenched preferences like colonoscopy. Future work should examine which facets of personalization drive behavior change, whether integrating structural supports can translate these modest intent gains into completed screenings, and which health behaviors are most responsive to AI-supported guidance. △ Less

Submitted 10 July, 2025; originally announced July 2025.

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3264 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 11 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2507.05871 [pdf]

doi 10.1002/smll.202501426

Detecting Lifshitz Transitions Using Nonlinear Conductivity in Bilayer Graphene

Authors: Tanweer Ahmed, Harsh Varshney, Bao Q. Tu, Kenji Watanabe, Takashi Taniguchi, Marco Gobbi, Fèlix Casanova, Amit Agarwal, Luis E. Hueso

Abstract: The second-order nonlinear electrical response (NLER) is an intrinsic property of inversion symmetry-broken systems which can provide deep insights into the electronic band structures of atomically thin quantum materials. However, the impact of Fermi surface reconstructions, also known as Lifshitz transitions, on the NLER has remained elusive. We investigated NLER in bilayer graphene (BLG), where… ▽ More The second-order nonlinear electrical response (NLER) is an intrinsic property of inversion symmetry-broken systems which can provide deep insights into the electronic band structures of atomically thin quantum materials. However, the impact of Fermi surface reconstructions, also known as Lifshitz transitions, on the NLER has remained elusive. We investigated NLER in bilayer graphene (BLG), where the low-energy bands undergo Lifshitz transitions. Here, NLER undergoes a sign change near the Lifshitz transitions even at elevated temperatures $T\gtrsim10~$K. At the band edge, NLER in BLG is modulated by both extrinsic scattering and interfacial-strain-induced intrinsic Berry curvature dipole, both of which can be finely tuned externally by varying doping and interlayer potential. Away from the band edge, BLG exhibits second-order conductivity exceeding $30~μ$mV$^{-1}Ω^{-1}$ at 3K higher than any previous report. Our work establishes NLER as a reliable tool to probe Lifshitz transitions in quantum materials. △ Less

Submitted 8 July, 2025; originally announced July 2025.

Comments: This is the pre-peer reviewed version of the following article: Ahmed, T. et al., Small 2025, 21 (21), 2501426, which has been published in final form at https://doi.org/10.1002/smll.202501426. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions

Journal ref: Small 2025, 21 (21), 2501426

arXiv:2507.04274 [pdf, ps, other]

Spin-split magnon bands induce pure spin current in insulating altermagnets

Authors: Sankar Sarkar, Amit Agarwal

Abstract: Altermagnets offer a promising platform for dissipationless spin transport by combining zero net magnetization with spontaneous non-relativistic spin splitting. However, their magnonic transport properties remain largely unexplored. Here, we develop a quantum-kinetic theory for thermally driven magnon currents that cleanly separates Berry-curvature-driven intrinsic contributions from Drude-like sc… ▽ More Altermagnets offer a promising platform for dissipationless spin transport by combining zero net magnetization with spontaneous non-relativistic spin splitting. However, their magnonic transport properties remain largely unexplored. Here, we develop a quantum-kinetic theory for thermally driven magnon currents that cleanly separates Berry-curvature-driven intrinsic contributions from Drude-like scattering-dependent terms. Applying this framework to a collinear honeycomb antiferromagnet with anisotropic next-nearest-neighbor exchange and Dzyaloshinskii-Moriya interaction, we reveal spin-split magnon bands that support both intrinsic and extrinsic spin Nernst and Seebeck currents. For realistic parameters, we predict a sizable magnon spin-splitting angle (about 3.3 degrees) and a pure transverse spin current capable of exerting a strong spin-splitter torque suitable for magnetization switching. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: 14 pages and 4 figures. We invite comments and feedback

arXiv:2507.02972 [pdf, ps, other]

Farm-Level, In-Season Crop Identification for India

Authors: Ishan Deshpande, Amandeep Kaur Reehal, Chandan Nath, Renu Singh, Aayush Patel, Aishwarya Jayagopal, Gaurav Singh, Gaurav Aggarwal, Amit Agarwal, Prathmesh Bele, Sridhar Reddy, Tanya Warrier, Kinjal Singh, Ashish Tendulkar, Luis Pazos Outon, Nikita Saxena, Agata Dondzik, Dinesh Tewari, Shruti Garg, Avneet Singh, Harsh Dhand, Vaibhav Rajan, Alok Talekar

Abstract: Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restri… ▽ More Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restricted crop type coverage, the complexities of mixed-pixel and heterogeneous landscapes, and crucially, the robust in-season identification essential for proactive decision-making. We present a framework designed to address the critical data gaps for targeted data driven decision making which generates farm-level, in-season, multi-crop identification at national scale (India) using deep learning. Our methodology leverages the strengths of Sentinel-1 and Sentinel-2 satellite imagery, integrated with national-scale farm boundary data. The model successfully identifies 12 major crops (which collectively account for nearly 90% of India's total cultivated area showing an agreement with national crop census 2023-24 of 94% in winter, and 75% in monsoon season). Our approach incorporates an automated season detection algorithm, which estimates crop sowing and harvest periods. This allows for reliable crop identification as early as two months into the growing season and facilitates rigorous in-season performance evaluation. Furthermore, we have engineered a highly scalable inference pipeline, culminating in what is, to our knowledge, the first pan-India, in-season, farm-level crop type data product. The system's effectiveness and scalability are demonstrated through robust validation against national agricultural statistics, showcasing its potential to deliver actionable, data-driven insights for transformative agricultural monitoring and management across India. △ Less

Submitted 30 June, 2025; originally announced July 2025.

arXiv:2506.21644 [pdf]

Modelling the non-linear dynamics of the looping pendulum

Authors: Avighna Daruka, Gyaneshwaran Gomathinayagam, Aneesh Agarwal

Abstract: The Looping pendulum phenomenon was first introduced in 2019 at the 32nd edition of the IYPT, wherein a lighter bob sweeps around a cylindrical rod to support the weight of a heavier bob. In this paper, the phenomenon was divided based on rotating and non-rotating forces, and differential equations were derived for each. To verify the theoretical derivation, an experimental analysis was done, vary… ▽ More The Looping pendulum phenomenon was first introduced in 2019 at the 32nd edition of the IYPT, wherein a lighter bob sweeps around a cylindrical rod to support the weight of a heavier bob. In this paper, the phenomenon was divided based on rotating and non-rotating forces, and differential equations were derived for each. To verify the theoretical derivation, an experimental analysis was done, varying the mass ratio with the vertical distance travelled by the heavier bob. (Tracked using tracker) Experimental findings fit a logarithmic curve fit -- falling succinctly with a similar trend with the simulation run with MATLAB solving the derived differential equations. Furthermore, to verify the simulation, the trajectory of both the lighter and heavier mass was also compared for the simulation and experimental findings. The experimental findings fit very closely to the simulation findings, accrediting the validity and accuracy of the derived theory. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: 40 pages

arXiv:2506.16678 [pdf, ps, other]

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations

Authors: Ananth Agarwal, Jasper Jian, Christopher D. Manning, Shikhar Murty

Abstract: Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the precise mechanism by which they represent syntactic structure is an open area within interpretability research. Probing provides one way to identify the mechanism of syntax being linearly encoded in… ▽ More Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the precise mechanism by which they represent syntactic structure is an open area within interpretability research. Probing provides one way to identify the mechanism of syntax being linearly encoded in activations, however, no comprehensive study has yet established whether a model's probing accuracy reliably predicts its downstream syntactic performance. Adopting a "mechanisms vs. outcomes" framework, we evaluate 32 open-weight transformer models and find that syntactic features extracted via probing fail to predict outcomes of targeted syntax evaluations across English linguistic phenomena. Our results highlight a substantial disconnect between latent syntactic representations found via probing and observable syntactic behaviors in downstream tasks. △ Less

Submitted 19 June, 2025; originally announced June 2025.

arXiv:2506.13493 [pdf, ps, other]

Nonlinear bulk photocurrent probe Z2 topological phase transition

Authors: Debasis Dutta, Raihan Ahammed, Yingdong Wei, Xiaokai Pan, Xiaoshuang Chen, Lin Wang, Amit Agarwal

Abstract: Detecting topological phase transitions in bulk is challenging due to the limitations of surface sensitive probes like ARPES. Here, we demonstrate that nonlinear bulk photocurrents, specifically shift and injection currents, serve as effective probes of Z_2 topological transitions. These photocurrents show a robust polarity reversal across the Z_2 phase transition, offering a direct optical signat… ▽ More Detecting topological phase transitions in bulk is challenging due to the limitations of surface sensitive probes like ARPES. Here, we demonstrate that nonlinear bulk photocurrents, specifically shift and injection currents, serve as effective probes of Z_2 topological transitions. These photocurrents show a robust polarity reversal across the Z_2 phase transition, offering a direct optical signature that distinguishes strong topological phases from weak or trivial ones. This effect originates from a reorganization of key band geometric quantities, the Berry curvature and shift vector, on time-reversal-invariant momentum planes. Using a low energy Dirac model, we trace this behaviour to a band inversion in the time-reversal-invariant momentum plane that drives the topological transition. We validate these findings through tight-binding model for Bi_2Te_3 and first-principles calculations for ZrTe_5 and BiTeI, where the topological phase can be tuned by pressure or temperature. Our results establish nonlinear photocurrent as a sensitive and broadly applicable probe of Z_2 topological phase transitions. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 9 pages, 6 figures, 61 references. We will appreciate any comments of suggestions on this work

arXiv:2506.10910 [pdf, ps, other]

Magistral

Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.08928 [pdf, ps, other]

Local MDI+: Local Feature Importances for Tree-Based Models

Authors: Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

Abstract: Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific)… ▽ More Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups. △ Less

Submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.07949 [pdf, ps, other]

Cost-Optimal Active AI Model Evaluation

Authors: Anastasios N. Angelopoulos, Jacob Eisenstein, Jonathan Berant, Alekh Agarwal, Adam Fisch

Abstract: The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes it necessary to rely on synthetic annotation data because of the low cost, despite the potential for substantial bias. In this paper, we develop novel, cost-aware methods for actively balancing the u… ▽ More The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes it necessary to rely on synthetic annotation data because of the low cost, despite the potential for substantial bias. In this paper, we develop novel, cost-aware methods for actively balancing the use of a cheap, but often inaccurate, weak rater -- such as a model-based autorater that is designed to automatically assess the quality of generated content -- with a more expensive, but also more accurate, strong rater alternative such as a human. More specifically, the goal of our approach is to produce a low variance, unbiased estimate of the mean of the target "strong" rating, subject to some total annotation budget. Building on recent work in active and prediction-powered statistical inference, we derive a family of cost-optimal policies for allocating a given annotation budget between weak and strong raters so as to maximize statistical efficiency. Using synthetic and real-world data, we empirically characterize the conditions under which these policies yield improvements over prior methods. We find that, especially in tasks where there is high variability in the difficulty of examples, our policies can achieve the same estimation precision at a far lower total annotation budget than standard evaluation methods. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.05037 [pdf, ps, other]

Limits at infinity for Hajłasz-Sobolev functions in metric spaces

Authors: Angha Agarwal, Antti V. Vähäkangas

Abstract: We study limits at infinity for homogeneous Hajlasz-Sobolev functions defined on uniformly perfect metric spaces equipped with a doubling measure. We prove that a quasicontinuous representative of such a function has a pointwise limit at infinity outside an exceptional set, defined in terms of a variational relative capacity. Our framework refines earlier approaches that relied on Hausdorff conten… ▽ More We study limits at infinity for homogeneous Hajlasz-Sobolev functions defined on uniformly perfect metric spaces equipped with a doubling measure. We prove that a quasicontinuous representative of such a function has a pointwise limit at infinity outside an exceptional set, defined in terms of a variational relative capacity. Our framework refines earlier approaches that relied on Hausdorff content rather than relative capacity, and it extends previous results for homogeneous Newtonian and fractional Sobolev functions. △ Less

Submitted 5 June, 2025; originally announced June 2025.

MSC Class: 46E36; 31C15; 31B15; 31B25

arXiv:2506.04166 [pdf, ps, other]

N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Authors: Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi

Abstract: Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications.… ▽ More Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications. This paper introduces N$^2$, a unified Python package and testbed that consolidates a broad class of NN-based methods through a modular, extensible interface. Built for both researchers and practitioners, N$^2$ supports rapid experimentation and benchmarking. Using this framework, we introduce a new NN variant that achieves state-of-the-art results in several settings. We also release a benchmark suite of real-world datasets, from healthcare and recommender systems to causal inference and LLM evaluation, designed to stress-test matrix completion methods beyond synthetic scenarios. Our experiments demonstrate that while classical methods excel on idealized data, NN-based techniques consistently outperform them in real-world settings. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 21 pages, 6 figures

arXiv:2506.02097 [pdf, ps, other]

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda

Abstract: Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated dom… ▽ More Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated domain-specific knowledge. This paper introduces a novel hybrid framework that integrates RAG with intent-based canned responses, leveraging predefined high-confidence responses for efficiency while dynamically routing complex or ambiguous queries to the RAG pipeline. Our framework employs a dialogue context manager to ensure coherence in multi-turn interactions and incorporates a feedback loop to refine intents, dynamically adjust confidence thresholds, and expand response coverage over time. Experimental results demonstrate that the proposed framework achieves a balance of high accuracy (95\%) and low latency (180ms), outperforming RAG and intent-based systems across diverse query types, positioning it as a scalable and adaptive solution for enterprise conversational AI applications. △ Less

Submitted 25 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

Comments: Proceedings of the 4th International Workshop on Knowledge Augmented Methods for Natural Language Processing in NAACL 2025, pages 215 to 229, Albuquerque, New Mexico, USA. Association for Computational Linguistics

Journal ref: Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing (KnowledgeNLP 2025), pp. 215 to 229, Association for Computational Linguistics, Albuquerque, New Mexico, May 2025

arXiv:2506.01315 [pdf, ps, other]

Regular genus of $\mathbb{S}^2 \times \mathbb{S}^1 \times \mathbb{S}^1$, $4$-torus, and small covers over $Δ^2 \times Δ^2$

Authors: Anshu Agarwal, Biplab Basak

Abstract: A crystallization of a PL manifold is an edge-colored graph encoding a contracted triangulation of the manifold. The concept of regular genus generalizes the notions of surface genus and Heegaard genus for 3-manifolds to higher-dimensional closed PL manifolds. The regular genus of a PL manifold is a PL invariant. Determining the regular genus of a closed PL $n$-manifold remains a fundamental chall… ▽ More A crystallization of a PL manifold is an edge-colored graph encoding a contracted triangulation of the manifold. The concept of regular genus generalizes the notions of surface genus and Heegaard genus for 3-manifolds to higher-dimensional closed PL manifolds. The regular genus of a PL manifold is a PL invariant. Determining the regular genus of a closed PL $n$-manifold remains a fundamental challenge in combinatorial topology. In this article, we first resolve a conjecture by proving that the regular genus of $\mathbb{S}^2 \times \mathbb{S}^1 \times \mathbb{S}^1$ is 6. Additionally, we determine that the regular genus of $\mathbb{S}^1 \times \mathbb{S}^1 \times \mathbb{S}^1 \times \mathbb{S}^1$ is 16. We also present some observations related to the regular genus of the $n$-dimensional torus and conjecture that the regular genus of $\mathbb{S}^1 \times \mathbb{S}^1 \times \cdots \times \mathbb{S}^1$ ($n$ times) is $1+\frac{(n+1)! \ (n-3)}{8}$, for $n\ge 5$. Then, we investigate the regular genus of small covers. Small covers are closed $n$-manifolds admitting a locally standard $\mathbb{Z}_2^n$-action with orbit space homeomorphic to a simple convex polytope $P^n$. For the polytope $P = Δ^2 \times Δ^2$, we classify all the small covers up to Davis-Januszkiewicz (D-J) equivalence and show that there are exactly seven such covers. Among these, one is $\mathbb{RP}^2 \times \mathbb{RP}^2$, while the others are $\mathbb{RP}^2$-bundles over $\mathbb{RP}^2$. Remarkably, each of these seven small covers has the regular genus 8. Results in this article provide explicit regular genus values for several important 4-manifolds, offering new insights and tools for future work in combinatorial topology. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: 25 Pages, 15 figures

MSC Class: 57Q15

arXiv:2506.00482 [pdf, ps, other]

BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

Authors: Eunsu Kim, Haneul Yoo, Guijin Son, Hitesh Patel, Amit Agarwal, Alice Oh

Abstract: As large language models (LLMs) continue to advance, the need for up-to-date and well-organized benchmarks becomes increasingly critical. However, many existing datasets are scattered, difficult to manage, and make it challenging to perform evaluations tailored to specific needs or domains, despite the growing importance of domain-specific models in areas such as math or code. In this paper, we in… ▽ More As large language models (LLMs) continue to advance, the need for up-to-date and well-organized benchmarks becomes increasingly critical. However, many existing datasets are scattered, difficult to manage, and make it challenging to perform evaluations tailored to specific needs or domains, despite the growing importance of domain-specific models in areas such as math or code. In this paper, we introduce BenchHub, a dynamic benchmark repository that empowers researchers and developers to evaluate LLMs more effectively. BenchHub aggregates and automatically classifies benchmark datasets from diverse domains, integrating 303K questions across 38 benchmarks. It is designed to support continuous updates and scalable data management, enabling flexible and customizable evaluation tailored to various domains or use cases. Through extensive experiments with various LLM families, we demonstrate that model performance varies significantly across domain-specific subsets, emphasizing the importance of domain-aware benchmarking. We believe BenchHub can encourage better dataset reuse, more transparent model comparisons, and easier identification of underrepresented areas in existing benchmarks, offering a critical infrastructure for advancing LLM evaluation research. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.23622 [pdf, ps, other]

Fast-tracking and disentangling of qubit noise fluctuations using minimal-data averaging and hierarchical discrete fluctuation auto-segmentation

Authors: Abhishek Agarwal, Lachlan P. Lindoy, Deep Lall, Sebastian E. de Graaf, Tobias Lindström, Ivan Rungger

Abstract: Qubit noise and fluctuations of the noise over time are key factors limiting the performance of quantum computers. Characterising them with high temporal resolution is challenging due to multiple overlapping stochastic processes such as discrete jumps and continuous drifts. Hence, experiments typically probe individual sources of fluctuations rather than concurrent fluctuations caused by multiple… ▽ More Qubit noise and fluctuations of the noise over time are key factors limiting the performance of quantum computers. Characterising them with high temporal resolution is challenging due to multiple overlapping stochastic processes such as discrete jumps and continuous drifts. Hence, experiments typically probe individual sources of fluctuations rather than concurrent fluctuations caused by multiple sources. To overcome this limitation we develop a framework comprising a noise characterisation method with minimal measurements allowing high temporal resolution, combined with a hierarchical discrete fluctuation auto-segmentation tool to disentangle the overlapping fluctuations without human intervention, enabling their characterisation and tracking over long times. We show that on transmon qubits the method can track and disentangle qubit frequency fluctuations with temporal resolution of a few tens of milliseconds over hours. This enables us to identify the origins of the fluctuations as overlapping charge parity and two-level-systems switching. Beyond insights into the fluctuation origins, our method also provides information that can be used to improve qubit calibration, error mitigation and error correction. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.18366 [pdf, ps, other]

Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems

Authors: Hansa Meghwani, Amit Agarwal, Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Srikant Panda

Abstract: Enterprise search systems often struggle to retrieve accurate, domain-specific information due to semantic mismatches and overlapping terminologies. These issues can degrade the performance of downstream applications such as knowledge management, customer support, and retrieval-augmented generation agents. To address this challenge, we propose a scalable hard-negative mining framework tailored spe… ▽ More Enterprise search systems often struggle to retrieve accurate, domain-specific information due to semantic mismatches and overlapping terminologies. These issues can degrade the performance of downstream applications such as knowledge management, customer support, and retrieval-augmented generation agents. To address this challenge, we propose a scalable hard-negative mining framework tailored specifically for domain-specific enterprise data. Our approach dynamically selects semantically challenging but contextually irrelevant documents to enhance deployed re-ranking models. Our method integrates diverse embedding models, performs dimensionality reduction, and uniquely selects hard negatives, ensuring computational efficiency and semantic precision. Evaluation on our proprietary enterprise corpus (cloud services domain) demonstrates substantial improvements of 15\% in MRR@3 and 19\% in MRR@10 compared to state-of-the-art baselines and other negative sampling techniques. Further validation on public domain-specific datasets (FiQA, Climate Fever, TechQA) confirms our method's generalizability and readiness for real-world applications. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: Accepted to ACL 2025

ACM Class: H.3.3; I.2.6; I.2.7

arXiv:2505.18149 [pdf, ps, other]

First Finish Search: Efficient Test-Time Scaling in Large Language Models

Authors: Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty

Abstract: Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shor… ▽ More Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shorter traces are much more likely to be correct than longer ones. Motivated by this, we introduce First Finish Search (FFS), a training-free parallel decoding strategy that launches $n$ independent samples and returns as soon as any one completes. We evaluate FFS alongside simple decoding, beam search, majority voting, and budget forcing on four reasoning models (DeepSeek-R1, R1-Distill-Qwen-32B, QwQ-32B and Phi-4-Reasoning-Plus) and across four datasets (AIME24, AIME25-I, AIME25-II and GPQA Diamond). With DeepSeek-R1, FFS achieves $82.23\%$ accuracy on the AIME datasets, a $15\%$ improvement over DeepSeek-R1's standalone accuracy, nearly matching OpenAI's o4-mini performance. Our theoretical analysis explains why stopping at the shortest trace is likely to yield a correct answer and identifies the conditions under which early stopping may be suboptimal. The elegance and simplicity of FFS demonstrate that straightforward TTS strategies can perform remarkably well, revealing the untapped potential of simple approaches at inference time. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17495 [pdf, ps, other]

ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Authors: Landon Butler, Abhineet Agarwal, Justin Singh Kang, Yigit Efe Erginbas, Bin Yu, Kannan Ramchandran

Abstract: Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach th… ▽ More Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach that uses interaction sparsity to scale to $n \approx 10^3$ features. SPEX greatly improves upon prior methods but requires tens of thousands of model inferences, which can be prohibitive for large models. In this paper, we observe that LLM feature interactions are often hierarchical -- higher-order interactions are accompanied by their lower-order subsets -- which enables more efficient discovery. To exploit this hierarchy, we propose ProxySPEX, an interaction attribution algorithm that first fits gradient boosted trees to masked LLM outputs and then extracts the important interactions. Experiments across four challenging high-dimensional datasets show that ProxySPEX more faithfully reconstructs LLM outputs by 20% over marginal attribution approaches while using $10\times$ fewer inferences than SPEX. By accounting for interactions, ProxySPEX identifies features that influence model output over 20% more than those selected by marginal approaches. Further, we apply ProxySPEX to two interpretability tasks. Data attribution, where we identify interactions among CIFAR-10 training samples that influence test predictions, and mechanistic interpretability, where we uncover interactions between attention heads, both within and across layers, on a question-answering task. ProxySPEX identifies interactions that enable more aggressive pruning of heads than marginal approaches. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17332 [pdf, ps, other]

SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

Authors: Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae

Abstract: Enterprise customers are increasingly adopting Large Language Models (LLMs) for critical communication tasks, such as drafting emails, crafting sales pitches, and composing casual messages. Deploying such models across different regions requires them to understand diverse cultural and linguistic contexts and generate safe and respectful responses. For enterprise applications, it is crucial to miti… ▽ More Enterprise customers are increasingly adopting Large Language Models (LLMs) for critical communication tasks, such as drafting emails, crafting sales pitches, and composing casual messages. Deploying such models across different regions requires them to understand diverse cultural and linguistic contexts and generate safe and respectful responses. For enterprise applications, it is crucial to mitigate reputational risks, maintain trust, and ensure compliance by effectively identifying and handling unsafe or offensive language. To address this, we introduce SweEval, a benchmark simulating real-world scenarios with variations in tone (positive or negative) and context (formal or informal). The prompts explicitly instruct the model to include specific swear words while completing the task. This benchmark evaluates whether LLMs comply with or resist such inappropriate instructions and assesses their alignment with ethical frameworks, cultural nuances, and language comprehension capabilities. In order to advance research in building ethically aligned AI systems for enterprise use and beyond, we release the dataset and code: https://github.com/amitbcp/multilingual_profanity. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: Published in the Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Industry Track, pages 558-582

ACM Class: I.2.7; I.2.6

arXiv:2505.17330 [pdf, ps, other]

FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding

Authors: Amit Agarwal, Srikant Panda, Kulbhushan Pachauri

Abstract: In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR er… ▽ More In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: Published in the Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Industry Track, pages 100-114

ACM Class: I.2.7; I.5.4; I.7

arXiv:2505.15613 [pdf, ps, other]

Congestion and extreme events in urban street networks

Authors: Ajay Agarwal, M. S. Santhanam

Abstract: Congestion and extreme events in transportation networks are emergent phenomena with significant socio-economic implications. In this work, we study congestion and extreme event properties on real urban street (planar) networks drawn from four cities and compare it with that on a regular square grid. For dynamics, we employ three variants of random walk with additional realistic transport features… ▽ More Congestion and extreme events in transportation networks are emergent phenomena with significant socio-economic implications. In this work, we study congestion and extreme event properties on real urban street (planar) networks drawn from four cities and compare it with that on a regular square grid. For dynamics, we employ three variants of random walk with additional realistic transport features. In all the four urban street networks and 2D square grid and with all dynamical models, phase transitions are observed from a free flow to congested phase as a function of birth rate of vehicles. These transitions can be modified by traffic-aware routing protocols, but congestion cannot be entirely mitigated. In organically evolved street networks, we observe a semi-congested regime which has both congested and free-flow components. In the free-flow regime, the extreme event occurrence probability is larger for small degree nodes than for hubs, a feature originally observed in non-planar scale-free networks. In general, with respect to congestion and extreme events, the urban street networks and regular square grid display similar properties. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 8 pages, 9 figures

arXiv:2505.11976 [pdf]

Advanced Integration of Discrete Line Segments in Digitized P&ID for Continuous Instrument Connectivity

Authors: Soumya Swarup Prusty, Astha Agarwal, Srinivasan Iyenger

Abstract: Piping and Instrumentation Diagrams (P&IDs) constitute the foundational blueprint of a plant, depicting the interconnections among process equipment, instrumentation for process control, and the flow of fluids and control signals. In their existing setup, the manual mapping of information from P&ID sheets holds a significant challenge. This is a time-consuming process, taking around 3-6 months, an… ▽ More Piping and Instrumentation Diagrams (P&IDs) constitute the foundational blueprint of a plant, depicting the interconnections among process equipment, instrumentation for process control, and the flow of fluids and control signals. In their existing setup, the manual mapping of information from P&ID sheets holds a significant challenge. This is a time-consuming process, taking around 3-6 months, and is susceptible to errors. It also depends on the expertise of the domain experts and often requires multiple rounds of review. The digitization of P&IDs entails merging detected line segments, which is essential for linking various detected instruments, thereby creating a comprehensive digitized P&ID. This paper focuses on explaining how line segments which are detected using a computer vision model are merged and eventually building the connection between equipment and merged lines. Hence presenting a digitized form of information stating the interconnection between process equipment, instrumentation, flow of fluids and control signals. Eventually, which can be stored in a knowledge graph and that information along with the help of advanced algorithms can be leveraged for tasks like finding optimal routes, detecting system cycles, computing transitive closures, and more. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: 6 pages, 13 figures

arXiv:2505.08784 [pdf, ps, other]

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

Authors: Abhineet Agarwal, Michael Xiao, Rebecca Barter, Omer Ronen, Boyu Fan, Bin Yu

Abstract: As machine learning (ML) models are increasingly deployed in high-stakes domains, trustworthy uncertainty quantification (UQ) is critical for ensuring the safety and reliability of these models. Traditional UQ methods rely on specifying a true generative model and are not robust to misspecification. On the other hand, conformal inference allows for arbitrary ML models but does not consider model s… ▽ More As machine learning (ML) models are increasingly deployed in high-stakes domains, trustworthy uncertainty quantification (UQ) is critical for ensuring the safety and reliability of these models. Traditional UQ methods rely on specifying a true generative model and are not robust to misspecification. On the other hand, conformal inference allows for arbitrary ML models but does not consider model selection, which leads to large interval sizes. We tackle these drawbacks by proposing a UQ method based on the predictability, computability, and stability (PCS) framework for veridical data science proposed by Yu and Kumbier. Specifically, PCS-UQ addresses model selection by using a prediction check to screen out unsuitable models. PCS-UQ then fits these screened algorithms across multiple bootstraps to assess inter-sample variability and algorithmic instability, enabling more reliable uncertainty estimates. Further, we propose a novel calibration scheme that improves local adaptivity of our prediction sets. Experiments across $17$ regression and $6$ classification datasets show that PCS-UQ achieves the desired coverage and reduces width over conformal approaches by $\approx 20\%$. Further, our local analysis shows PCS-UQ often achieves target coverage across subgroups while conformal methods fail to do so. For large deep-learning models, we propose computationally efficient approximation schemes that avoid the expensive multiple bootstrap trainings of PCS-UQ. Across three computer vision benchmarks, PCS-UQ reduces prediction set size over conformal methods by $20\%$. Theoretically, we show a modified PCS-UQ algorithm is a form of split conformal inference and achieves the desired coverage with exchangeable data. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.03155 [pdf, ps, other]

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

Authors: Max Qiushi Lin, Jincheng Mei, Matin Aghaei, Michael Lu, Bo Dai, Alekh Agarwal, Dale Schuurmans, Csaba Szepesvari, Sharan Vaswani

Abstract: Policy gradient (PG) methods have played an essential role in the empirical successes of reinforcement learning. In order to handle large state-action spaces, PG methods are typically used with function approximation. In this setting, the approximation error in modeling problem-dependent quantities is a key notion for characterizing the global convergence of PG methods. We focus on Softmax PG with… ▽ More Policy gradient (PG) methods have played an essential role in the empirical successes of reinforcement learning. In order to handle large state-action spaces, PG methods are typically used with function approximation. In this setting, the approximation error in modeling problem-dependent quantities is a key notion for characterizing the global convergence of PG methods. We focus on Softmax PG with linear function approximation (referred to as $\texttt{Lin-SPG}$) and demonstrate that the approximation error is irrelevant to the algorithm's global convergence even for the stochastic bandit setting. Consequently, we first identify the necessary and sufficient conditions on the feature representation that can guarantee the asymptotic global convergence of $\texttt{Lin-SPG}$. Under these feature conditions, we prove that $T$ iterations of $\texttt{Lin-SPG}$ with a problem-specific learning rate result in an $O(1/T)$ convergence to the optimal policy. Furthermore, we prove that $\texttt{Lin-SPG}$ with any arbitrary constant learning rate can ensure asymptotic global convergence to the optimal policy. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 75 pages

arXiv:2505.01928 [pdf, other]

GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting

Authors: Anushka Agarwal, Muhammad Yusuf Hassan, Talha Chafekar

Abstract: We introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio repr… ▽ More We introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio representations, enabling efficient multi-identity video synthesis. This design reduces computational overhead and achieves 6.8x faster training compared to state-of-the-art models, while maintaining high lip-sync accuracy and visual quality. △ Less

Submitted 3 May, 2025; originally announced May 2025.

arXiv:2504.20519 [pdf]

Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging

Authors: Neil K. R. Sehgal, Sunny Rai, Manuel Tonneau, Anish K. Agarwal, Joseph Cappella, Melanie Kornides, Lyle Ungar, Alison Buttenheim, Sharath Chandra Guntuku

Abstract: Large language model (LLM) based chatbots show promise in persuasive communication, but existing studies often rely on weak controls or focus on belief change rather than behavioral intentions or outcomes. This pre-registered multi-country (US, Canada, UK) randomized controlled trial involving 930 vaccine-hesitant parents evaluated brief (three-minute) multi-turn conversations with LLM-based chatb… ▽ More Large language model (LLM) based chatbots show promise in persuasive communication, but existing studies often rely on weak controls or focus on belief change rather than behavioral intentions or outcomes. This pre-registered multi-country (US, Canada, UK) randomized controlled trial involving 930 vaccine-hesitant parents evaluated brief (three-minute) multi-turn conversations with LLM-based chatbots against standard public health messaging approaches for increasing human papillomavirus (HPV) vaccine intentions for their children. Participants were randomly assigned to: (1) a weak control (no message), (2) a strong control reflecting the standard of care (reading official public health materials), or (3 and 4) one of two chatbot conditions. One chatbot was prompted to deliver short, conversational responses, while the other used the model's default output style (longer with bullet points). While chatbot interactions significantly increased self-reported vaccination intent (by 7.1-10.3 points on a 100-point scale) compared to no message, they did not outperform standard public health materials, with the conversational chatbot performing significantly worse. Additionally, while the short-term effects of chatbot interactions faded during a 15-day follow-up, the effects of public health material persisted through a 45-day follow-up relative to no message. These findings suggest that while LLMs can effectively shift vaccination intentions in the short-term, their incremental value over existing public health communications is questionable, offering a more tempered view of their persuasive capabilities and highlighting the importance of integrating AI-driven tools alongside, rather than replacing, current public health strategies. △ Less

Submitted 26 June, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.19470 [pdf, ps, other]

A Cautionary Note on Quantum Oracles

Authors: Avantika Agarwal, Srijita Kundu

Abstract: In recent years, the quantum oracle model introduced by Aaronson and Kuperberg (2007) has found a lot of use in showing oracle separations between complexity classes and cryptographic primitives. It is generally assumed that proof techniques that do not relativize with respect to quantum oracles will also not relativize with respect to classical oracles. In this note, we show that this is not the… ▽ More In recent years, the quantum oracle model introduced by Aaronson and Kuperberg (2007) has found a lot of use in showing oracle separations between complexity classes and cryptographic primitives. It is generally assumed that proof techniques that do not relativize with respect to quantum oracles will also not relativize with respect to classical oracles. In this note, we show that this is not the case: specifically, we show that there is a quantum oracle problem that is contained in the class QMA, but not in a class we call polyQCPH. The class polyQCPH is equal to PSPACE with respect to classical oracles, and it is a well-known result that QMA is contained in PSPACE (also with respect to classical oracles). We also show that the same separation holds relative to a distributional oracle, which is a model introduced by Natarajan and Nirkhe (2024). We believe our findings show the need for some caution when using these non-standard oracle models, particularly when showing separations between quantum and classical resources. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2504.18885 [pdf, other]

Planar Nernst effect from hidden band geometry in layered two-dimensional materials

Authors: Rahul Biswas, Harsh Varshney, Amit Agarwal

Abstract: The Nernst effect is a versatile phenomenon relevant for energy harvesting, magnetic sensing, probing band topology and charge-neutral excitations. The planar Nernst effect (PNE) generates an in-plane voltage transverse to an applied temperature gradient under an in-plane magnetic field. Conventional Berry curvature-induced PNE is absent in two-dimensional (2D) systems, as the out-of-plane Berry c… ▽ More The Nernst effect is a versatile phenomenon relevant for energy harvesting, magnetic sensing, probing band topology and charge-neutral excitations. The planar Nernst effect (PNE) generates an in-plane voltage transverse to an applied temperature gradient under an in-plane magnetic field. Conventional Berry curvature-induced PNE is absent in two-dimensional (2D) systems, as the out-of-plane Berry curvature does not couple to the in-plane electron velocity. We challenge this notion by demonstrating a distinct planar Nernst effect in quasi-2D materials (2DPNE). We show that the 2DPNE originates from previously overlooked planar components of Berry curvature and orbital magnetic moment, arising from inter-layer tunneling in multilayered 2D systems. We comprehensively analyze the band-geometric origin and crystalline symmetry constraints on 2DPNE responses. We illustrate its experimental feasibility in strained bilayer graphene. Our findings significantly expand the theoretical understanding of planar Nernst effects, providing a clear pathway for next-generation magnetic sensing and energy-harvesting applications. △ Less

Submitted 1 May, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

Comments: 12 pages, 5 figures

arXiv:2504.18786 [pdf, ps, other]

Contracts: A unified lens on congestion control robustness, fairness, congestion, and generality

Authors: Anup Agarwal, Venkat Arun, Srinivasan Seshan

Abstract: Congestion control algorithms (CCAs) operate in partially observable environments, lacking direct visibility into link capacities, or competing flows. To ensure fair sharing of network resources, CCAs communicate their fair share through observable signals. For instance, Reno's fair share is encoded as $\propto 1/\sqrt{\texttt{loss rate}}$. We call such communication mechanisms \emph{contracts}. W… ▽ More Congestion control algorithms (CCAs) operate in partially observable environments, lacking direct visibility into link capacities, or competing flows. To ensure fair sharing of network resources, CCAs communicate their fair share through observable signals. For instance, Reno's fair share is encoded as $\propto 1/\sqrt{\texttt{loss rate}}$. We call such communication mechanisms \emph{contracts}. We show that the design choice of contracts fixes key steady-state performance metrics, including robustness to errors in congestion signals, fairness, amount of congestion (e.g., delay, loss), and generality (e.g., range of supported link rates). This results in fundamental tradeoffs between these metrics. Using properties of contracts we also identify design pitfalls that lead to starvation (extreme unfairness). We argue that CCA design and analysis should start with contracts to conscientiously pick tradeoffs and avoid pitfalls. We empirically validate our findings and discuss their implications on CCA design and network measurement. △ Less

Submitted 6 June, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

arXiv:2504.18445 [pdf, other]

doi 10.1088/1748-0221/20/04/P04022

Testing a large size triple GEM detector for the first station of the CBM-Muon Chambers with a high-intensity gamma source at GIF++ under large-area illumination

Authors: Apar Agarwal, Souvik Chattopadhay, Pawan Kumar Sharma, Anand Kumar Dubey, Jogender Saini, Vikas Singhal, Vinod Negi, Ekata Nandy, Chandrasekhar Ghosh, David Emschermann

Abstract: The physics studies at heavy-ion nucleus-nucleus collision experiments demand reliable detectors at high particle flux. Therefore, Gas Electron Multipliers (GEM) detectors, which show resilience to extreme radiation, are one of the prime choices for the upcoming Compressed Baryonic Matter (CBM) experiment at the Facility of Antiproton and Ion Research, Germany. However, operating them under these… ▽ More The physics studies at heavy-ion nucleus-nucleus collision experiments demand reliable detectors at high particle flux. Therefore, Gas Electron Multipliers (GEM) detectors, which show resilience to extreme radiation, are one of the prime choices for the upcoming Compressed Baryonic Matter (CBM) experiment at the Facility of Antiproton and Ion Research, Germany. However, operating them under these demanding conditions requires a systemic study at the highest incident particle flux. To this end, we have conducted extensive tests on a real-size triple GEM detector module with the high-intensity gamma flux using the Cs-137 source at the upgraded Gamma Irradiation Facility (GIF++) at Conseil Européen pour la Recherche Nucléaire (CERN). The detector response, particularly regarding the gain and efficiency of muon detection, was studied extensively with and without a gamma source in a free-streaming mode using self-triggered electronics. This configuration will be necessary for the CBM experiment since it will observe unprecedented event rates of about 10 MHz for Au-Au collisions. The analysis reveals an alignment between the expected and observed value of gain and efficiency with an increasing intensity of gamma flux at the operating voltage. The test results demonstrate that the large-size GEM detector prototype can handle elevated gamma rates of approximately 17.25 MHz/cm2 without significantly impacting its performance or suffering irreversible damage. △ Less

Submitted 25 April, 2025; originally announced April 2025.

Journal ref: Journal of Instrumentation, Volume 20, April 2025

arXiv:2504.17140 [pdf, other]

Scalable Permutation-Aware Modeling for Temporal Set Prediction

Authors: Ashish Ranjan, Ayush Agarwal, Shalin Barot, Sushant Kumar

Abstract: Temporal set prediction involves forecasting the elements that will appear in the next set, given a sequence of prior sets, each containing a variable number of elements. Existing methods often rely on intricate architectures with substantial computational overhead, which hampers their scalability. In this work, we introduce a novel and scalable framework that leverages permutation-equivariant and… ▽ More Temporal set prediction involves forecasting the elements that will appear in the next set, given a sequence of prior sets, each containing a variable number of elements. Existing methods often rely on intricate architectures with substantial computational overhead, which hampers their scalability. In this work, we introduce a novel and scalable framework that leverages permutation-equivariant and permutation-invariant transformations to efficiently model set dynamics. Our approach significantly reduces both training and inference time while maintaining competitive performance. Extensive experiments on multiple public benchmarks show that our method achieves results on par with or superior to state-of-the-art models across several evaluation metrics. These results underscore the effectiveness of our model in enabling efficient and scalable temporal set prediction. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.16977 [pdf, other]

Tokenization Matters: Improving Zero-Shot NER for Indic Languages

Authors: Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Amit Agarwal

Abstract: Tokenization is a critical component of Natural Language Processing (NLP), especially for low resource languages, where subword segmentation influences vocabulary structure and downstream task accuracy. Although Byte Pair Encoding (BPE) is a standard tokenization method in multilingual language models, its suitability for Named Entity Recognition (NER) in low resource Indic languages remains under… ▽ More Tokenization is a critical component of Natural Language Processing (NLP), especially for low resource languages, where subword segmentation influences vocabulary structure and downstream task accuracy. Although Byte Pair Encoding (BPE) is a standard tokenization method in multilingual language models, its suitability for Named Entity Recognition (NER) in low resource Indic languages remains underexplored due to its limitations in handling morphological complexity. In this work, we systematically compare BPE, SentencePiece, and Character Level tokenization strategies using IndicBERT for NER tasks in low resource Indic languages like Assamese, Bengali, Marathi, and Odia, as well as extremely low resource Indic languages like Santali, Manipuri, and Sindhi. We assess both intrinsic linguistic properties tokenization efficiency, out of vocabulary (OOV) rates, and morphological preservation as well as extrinsic downstream performance, including fine tuning and zero shot cross lingual transfer. Our experiments show that SentencePiece is a consistently better performing approach than BPE for NER in low resource Indic Languages, particularly in zero shot cross lingual settings, as it better preserves entity consistency. While BPE provides the most compact tokenization form, it is not capable of generalization because it misclassifies or even fails to recognize entity labels when tested on unseen languages. In contrast, SentencePiece constitutes a better linguistic structural preservation model, benefiting extremely low resource and morphologically rich Indic languages, such as Santali and Manipuri, for superior entity recognition, as well as high generalization across scripts, such as Sindhi, written in Arabic. The results point to SentencePiece as the more effective tokenization strategy for NER within multilingual and low resource Indic NLP applications. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.14739 [pdf, ps, other]

doi 10.1177/02783649251339680

A Modularized Design Approach for GelSight Family of Vision-based Tactile Sensors

Authors: Arpit Agarwal, Mohammad Amin Mirzaee, Xiping Sun, Wenzhen Yuan

Abstract: GelSight family of vision-based tactile sensors has proven to be effective for multiple robot perception and manipulation tasks. These sensors are based on an internal optical system and an embedded camera to capture the deformation of the soft sensor surface, inferring the high-resolution geometry of the objects in contact. However, customizing the sensors for different robot hands requires a ted… ▽ More GelSight family of vision-based tactile sensors has proven to be effective for multiple robot perception and manipulation tasks. These sensors are based on an internal optical system and an embedded camera to capture the deformation of the soft sensor surface, inferring the high-resolution geometry of the objects in contact. However, customizing the sensors for different robot hands requires a tedious trial-and-error process to re-design the optical system. In this paper, we formulate the GelSight sensor design process as a systematic and objective-driven design problem and perform the design optimization with a physically accurate optical simulation. The method is based on modularizing and parameterizing the sensor's optical components and designing four generalizable objective functions to evaluate the sensor. We implement the method with an interactive and easy-to-use toolbox called OptiSense Studio. With the toolbox, non-sensor experts can quickly optimize their sensor design in both forward and inverse ways following our predefined modules and steps. We demonstrate our system with four different GelSight sensors by quickly optimizing their initial design in simulation and transferring it to the real sensors. △ Less

Submitted 20 April, 2025; originally announced April 2025.

Comments: The paper is accepted to International Journal of Robotics Research with DOI 10.1177/02783649251339680

arXiv:2504.13776 [pdf, other]

Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization

Authors: Aman Agarwal, James Gearon, Raksha Rank, Etienne Chenevert

Abstract: Wildfires are increasing in intensity, frequency, and duration across large parts of the world as a result of anthropogenic climate change. Modern hazard detection and response systems that deal with wildfires are under-equipped for sustained wildfire seasons. Recent work has proved automated wildfire detection using Convolutional Neural Networks (CNNs) trained on satellite imagery are capable of… ▽ More Wildfires are increasing in intensity, frequency, and duration across large parts of the world as a result of anthropogenic climate change. Modern hazard detection and response systems that deal with wildfires are under-equipped for sustained wildfire seasons. Recent work has proved automated wildfire detection using Convolutional Neural Networks (CNNs) trained on satellite imagery are capable of high-accuracy results. However, CNNs are computationally expensive to train and only incorporate local image context. Recently, Vision Transformers (ViTs) have gained popularity for their efficient training and their ability to include both local and global contextual information. In this work, we show that ViT can outperform well-trained and specialized CNNs to detect wildfires on a previously published dataset of LandSat-8 imagery. One of our ViTs outperforms the baseline CNN comparison by 0.92%. However, we find our own implementation of CNN-based UNet to perform best in every category, showing their sustained utility in image tasks. Overall, ViTs are comparably capable in detecting wildfires as CNNs, though well-tuned CNNs are still the best technique for detecting wildfire with our UNet providing an IoU of 93.58%, better than the baseline UNet by some 4.58%. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.10404 [pdf, other]

Framing Perception: Exploring Camera Induced Objectification in Cinema

Authors: Parth Maradia, Ayushi Agarwal, Srija Bhupathiraju, Kavita Vemuri

Abstract: This study investigates how cinematographic techniques influence viewer perception and contribute to the objectification of women, utilizing eye-tracking data from 91 participants. They watched a sexualized music video (SV) known for objectifying portrayals and a non-sexualized music video (TV). Using dynamic Areas of Interests (AOIs) (head, torso, and lower body), gaze metrics such as fixation du… ▽ More This study investigates how cinematographic techniques influence viewer perception and contribute to the objectification of women, utilizing eye-tracking data from 91 participants. They watched a sexualized music video (SV) known for objectifying portrayals and a non-sexualized music video (TV). Using dynamic Areas of Interests (AOIs) (head, torso, and lower body), gaze metrics such as fixation duration, visit count, and scan paths were recorded to assess visual attention patterns. Participants were grouped according to their average fixations on sexualized AOIs. Statistical analyses revealed significant differences in gaze behavior between the videos and among the groups, with increased attention to sexualized AOIs in SV. Additionally, data-driven group differences in fixations identified specific segments with heightened objectification that are further analyzed using scan path visualization techniques. These findings provide strong empirical evidence of camera-driven gaze objectification, demonstrating how cinematic framing implicitly shapes objectifying gaze patterns, highlighting the critical need for mindful media representation. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.07516 [pdf, ps, other]

doi 10.1109/COMSNETS63942.2025.10885551

Enhancements for Developing a Comprehensive AI Fairness Assessment Standard

Authors: Avinash Agarwal, Mayashankar Kumar, Manisha J. Nene

Abstract: As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and public services, ensuring fairness in decision-making is essential to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions… ▽ More As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and public services, ensuring fairness in decision-making is essential to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions like autonomous network management and hyper-personalized services. The TEC Standard for Fairness Assessment and Rating of AI Systems provides guidelines for evaluating fairness in AI, focusing primarily on tabular data and supervised learning models. However, as AI applications diversify, this standard requires enhancement to strengthen its impact and broaden its applicability. This paper proposes an expansion of the TEC Standard to include fairness assessments for images, unstructured text, and generative AI, including large language models, ensuring a more comprehensive approach that keeps pace with evolving AI technologies. By incorporating these dimensions, the enhanced framework will promote responsible and trustworthy AI deployment across various sectors. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: 5 pages. Published in 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS). Access: https://ieeexplore.ieee.org/abstract/document/10885551

Journal ref: 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS), Bengaluru, India, 2025, pp. 1216-1220

arXiv:2504.06581 [pdf, other]

Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

Authors: Umakanta Maharana, Sarthak Verma, Avarna Agarwal, Prakashini Mruthyunjaya, Dwarikanath Mahapatra, Sakir Ahmed, Murari Mandal

Abstract: Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonge… ▽ More Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonged clinical evaluations, all of which can delay treatment and adversely affect patient outcomes. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions. In this work, we study the diagnostic capability of LLMs for Rheumatoid Arthritis (RA) with real world patients data. Patient data was collected alongside diagnoses from medical experts, and the performance of LLMs was evaluated in comparison to expert diagnoses for RA disease prediction. We notice an interesting pattern in disease diagnosis and find an unexpected \textit{misalignment between prediction and explanation}. We conduct a series of multi-round analyses using different LLM agents. The best-performing model accurately predicts rheumatoid arthritis (RA) diseases approximately 95\% of the time. However, when medical experts evaluated the reasoning generated by the model, they found that nearly 68\% of the reasoning was incorrect. This study highlights a clear misalignment between LLMs high prediction accuracy and its flawed reasoning, raising important questions about relying on LLM explanations in clinical settings. \textbf{LLMs provide incorrect reasoning to arrive at the correct answer for RA disease diagnosis.} △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2504.04948 [pdf]

doi 10.1088/978-0-7503-6342-6ch6

Physics for the environment and sustainable development

Authors: Jürgen Kurths, Ankit Agarwal, Ugur Öztürk, Shubham Sharma, Norbert Marwan, Deniz Eroglu

Abstract: A reliable understanding of the Earth system is essential for the life quality of modern society. Natural hazards are the cause of most life and resource losses. The ability to define the conditions for a sustainable development of humankind, to keep the Earth system within the boundaries of habitable states, or to predict critical transitions and events in the dynamics of the Earth system are cru… ▽ More A reliable understanding of the Earth system is essential for the life quality of modern society. Natural hazards are the cause of most life and resource losses. The ability to define the conditions for a sustainable development of humankind, to keep the Earth system within the boundaries of habitable states, or to predict critical transitions and events in the dynamics of the Earth system are crucial to mitigate and adapt to Earth system related events and changes (e.g., volcanic eruptions, earthquakes, climate change) and to avert the disastrous consequences of natural hazards. In this chapter, we discuss key concepts from nonlinear physics and show that they enable us to treat challenging problems of Earth sciences which cannot be solved by classic methods. In particular, the concepts of multi-scaling, recurrence, synchronization, and complex networks have become crucial in the very last decades for a substantially more profound understanding of the dynamics of earthquakes, landslides, or (palaeo-)climate. They can even provide a significantly improved prediction of several high-impact extreme events. Additionally, crucial open challenges in the realm of methodological nature and applications to Earth sciences are given. △ Less

Submitted 7 April, 2025; originally announced April 2025.

Journal ref: In: EPS Grand Challenges - Physics for Society in the Horizon 2050, IOP Publishing, Bristol (2024)

arXiv:2504.03335 [pdf, other]

Evolution of interacting coronal mass ejections driving the great geomagnetic storm on 10 May 2024

Authors: Soumyaranjan Khuntia, Wageesh Mishra, Anjali Agarwal

Abstract: The arrival of a series of coronal mass ejections (CMEs) at the Earth resulted in a great geomagnetic storm on 10 May 2024, the strongest storm in the last two decades. We investigate the kinematic and thermal evolution of the successive CMEs to understand their interaction en route to Earth. We attempt to find the dynamics, thermodynamics, and magnetic field signatures of CME-CME interactions. Ou… ▽ More The arrival of a series of coronal mass ejections (CMEs) at the Earth resulted in a great geomagnetic storm on 10 May 2024, the strongest storm in the last two decades. We investigate the kinematic and thermal evolution of the successive CMEs to understand their interaction en route to Earth. We attempt to find the dynamics, thermodynamics, and magnetic field signatures of CME-CME interactions. Our focus is to compare the thermal state of CMEs near the Sun and in their post-interaction phase at 1 AU. The 3D kinematics of six identified Earth-directed CMEs were determined using the GCS model. The flux rope internal state (FRIS) model is implemented to estimate the CMEs' polytropic index and temperature evolution from their measured kinematics. The thermal states of the interacting CMEs are examined using in-situ at 1 AU. Our study determined the interaction heights of selected CMEs and confirmed their interaction that led to the formation of complex ejecta identified at 1 AU. The plasma, magnetic field, and thermal characteristics of magnetic ejecta (ME) within the complex ejecta and other substructures, such as interaction regions (IRs) within two ME and double flux rope-like structures within a single ME, show the possible signatures of CME-CME interaction in in-situ observations. The FRIS-model-derived thermal states for individual CMEs reveal their diverse thermal evolution near the Sun, with most CMEs transitioning to an isothermal state at 6-9 Rsun, except for CME4, which exhibits an adiabatic state due to a slower expansion rate. The complex ejecta at 1 AU shows a predominant heat-release state in electrons, while the ions show a bimodal distribution of thermal states. On comparing the characteristics of CMEs near the Sun and at 1 AU, we suggest that such one-to-one comparison is difficult due to CME-CME interactions significantly influencing their post-interaction characteristics. △ Less

Submitted 4 April, 2025; originally announced April 2025.

Comments: 15 pages, 6 figures, 2 tables (Accepted for publication in Astronomy & Astrophysics journal)

arXiv:2504.02130 [pdf, other]

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans

Abstract: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. textcolor{blue}{First}, we establish a few key observations that frame the study: \textbf{(i)} Global convergence can be achieved under linear function approximation without policy or r… ▽ More We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. textcolor{blue}{First}, we establish a few key observations that frame the study: \textbf{(i)} Global convergence can be achieved under linear function approximation without policy or reward realizability, both for the standard Softmax PG and natural policy gradient (NPG). \textbf{(ii)} Approximation error is not a key quantity for characterizing global convergence in either algorithm. \textbf{(iii)} The conditions on the representation that imply global convergence are different between these two algorithms. Overall, these observations call into question approximation error as an appropriate quantity for characterizing the global convergence of PG methods under linear function approximation. \textcolor{blue}{Second}, motivated by these observations, we establish new general results: \textbf{(i)} NPG with linear function approximation achieves global convergence \emph{if and only if} the projection of the reward onto the representable space preserves the optimal action's rank, a quantity that is not strongly related to approximation error. \textbf{(ii)} The global convergence of Softmax PG occurs if the representation satisfies a non-domination condition and can preserve the ranking of rewards, which goes well beyond policy or reward realizability. We provide experimental results to support these theoretical findings. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: arXiv version for the NeurIPS 2023 paper; to be updated for a technical issue

arXiv:2504.01702 [pdf, ps, other]

A Causal Inference Framework for Data Rich Environments

Authors: Alberto Abadie, Anish Agarwal, Devavrat Shah

Abstract: We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential out… ▽ More We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential outcomes literature. We show how classic models for potential outcomes and treatment assignments fit within our framework. We provide an identification argument for the average treatment effect, the average treatment effect on the treated, and the average treatment effect on the untreated. For any estimator that has a fast enough estimation error rate for a certain nuisance parameter, we establish it is consistent for these various causal parameters. We then show principal component regression is one such estimator that leads to consistent estimation, and we analyze the minimal smoothness required of the potential outcomes function for consistency. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2503.22634 [pdf, other]

Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels

Authors: Adam Wei, Abhinav Agarwal, Boyuan Chen, Rohan Bosworth, Nicholas Pfaff, Russ Tedrake

Abstract: In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pu… ▽ More In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pushing from camera inputs enabled us to be thorough in our study. These experiments confirm that cotraining with simulated data \emph{can} dramatically improve performance in real, especially when real data is limited. Performance gains scale with simulated data, but eventually plateau; real-world data increases this performance ceiling. The results also suggest that reducing the domain gap in physics may be more important than visual fidelity for non-prehensile manipulation tasks. Perhaps surprisingly, having some visual domain gap actually helps the cotrained policy -- binary probes reveal that high-performing policies learn to distinguish simulated domains from real. We conclude by investigating this nuance and mechanisms that facilitate positive transfer between sim-and-real. In total, our experiments span over 40 real-world policies (evaluated on 800+ trials) and 200 simulated policies (evaluated on 40,000+ trials). △ Less

Submitted 28 March, 2025; originally announced March 2025.

Comments: 9 pages, 15 figures, In Submission to IROS 2025

arXiv:2503.15729 [pdf, other]

Ignition of weak interactions and r-process outflows in super-collapsar accretion disks

Authors: Aman Agarwal, Daniel M. Siegel, Brian D. Metzger, Chris Nagele

Abstract: The collapse of rotating massive (~$10 M_\odot$) stars resulting in hyperaccreting black holes (BHs; "collapsars") is a leading model for the central engines of long-duration gamma-ray bursts (GRBs) and a promising source of rapid neutron capture ("r-process") elements. R-process nucleosynthesis in disk outflows requires the accretion flow to self-neutronize. This occurs because of Pauli-blocking… ▽ More The collapse of rotating massive (~$10 M_\odot$) stars resulting in hyperaccreting black holes (BHs; "collapsars") is a leading model for the central engines of long-duration gamma-ray bursts (GRBs) and a promising source of rapid neutron capture ("r-process") elements. R-process nucleosynthesis in disk outflows requires the accretion flow to self-neutronize. This occurs because of Pauli-blocking at finite electron degeneracy, associated with a critical accretion rate $\dot M > \dot{M}_{\rm ign}$. We analytically examine the assumptions underlying this "ignition threshold" and its possible breakdown with increasing BH mass $M$. Employing three-dimensional general-relativistic magnetohydrodynamic simulations with weak interactions, we explore the physical conditions of collapsar accretion disks with $M$ ~ 80-3000 $M_\odot$ over more than a viscous timescale as they transition through the threshold. There is remarkable agreement between our simulations and the analytic result $\dot{M}_{\rm ign}\propto α^{5/3}M^{4/3}$ for $M$ ~ 3-3000 $M_\odot$. Simulations and analytic analyses consistently show that the largest BHs leading to r-process nucleosynthesis at $\dot{M}_{\rm ign}$ are $\approx 3000 M_\odot$, beyond which self-neutronization ceases, since the disk temperature $T\propto M^{-1/6}$ decreases below the neutron-proton mass difference (~MeV), suppressing the conversion of protons into neutrons. We show that stellar models of ~$250-10^5M_\odot$ can give rise to BHs of $M$ ~30-1000 $M_\odot$ accreting at $\dot M\gtrsim \dot{M}_{\rm ign}$, yielding ~$10-100 M_\odot$ of light and heavy r-process elements per event. These rare but prolific r-process sources in low-metallicity environments are associated with super-kilonovae and likely extremely energetic GRBs. Such signatures may be used to probe Population III stars. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 25 pages, 16 figures

arXiv:2503.13521 [pdf, other]

States of Disarray: Cleaning Data for Gerrymandering Analysis

Authors: Ananya Agarwal, Fnu Alusi, Arbie Hsu, Arif Syraj, Ellen Veomett

Abstract: The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-acce… ▽ More The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-accepted method of creating such an ensemble is to use a Markov Chain Monte Carlo (MCMC) process. This process requires that various pieces of data be gathered, cleaned, and coalesced into a single file that can be used as the seed of the MCMC process. In this article, we describe how we have begun this cleaning process for each state, and made the resulting data available for the public at https://github.com/eveomett-states . At the time of submission, we have data for 22 states available for researchers, students, and the general public to easily access and analyze. We will continue the data cleaning process for each state, and we hope that the availability of these datasets will both further research in this area, and increase the public's interest in and understanding of modern techniques to detect gerrymandering. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 12 pages, 3 figures

MSC Class: 51-11 (Primary) 68V35 (Secondary) ACM Class: E.m; J.4

arXiv:2503.09683 [pdf, other]

Variational preparation of normal matrix product states on quantum computers

Authors: Ben Jaderberg, George Pennington, Kate V. Marshall, Lewis W. Anderson, Abhishek Agarwal, Lachlan P. Lindoy, Ivan Rungger, Stefano Mensa, Jason Crain

Abstract: Preparing matrix product states (MPSs) on quantum computers is important for a wide class of quantum algorithms including the simulation of many-body physics. However, widely-used schemes based on staircase circuits are often too deep to be run on quantum computers today. In this work, we demonstrate how normal MPSs, which have short-range correlations, can be prepared with shallow circuits using… ▽ More Preparing matrix product states (MPSs) on quantum computers is important for a wide class of quantum algorithms including the simulation of many-body physics. However, widely-used schemes based on staircase circuits are often too deep to be run on quantum computers today. In this work, we demonstrate how normal MPSs, which have short-range correlations, can be prepared with shallow circuits using heuristics from approximate quantum compiling (AQC). We achieve this with ADAPT-AQC, an adaptive-ansatz preparation algorithm, as well as with a generalised initialisation scheme for the existing AQC-Tensor algorithm. We subsequently apply these methods to prepare an antiferromagnetic (AFM) ground state of the 50-site Heisenberg XXZ spin chain near the AFM-XY phase boundary and study the dynamics following a global quench. Through the execution of circuits with up to 59 CZ depth and 1251 CZ gates, we obtain the signature relaxation of magnetic ordering for a parameter regime previously inaccessible on quantum hardware due to deep ground state preparation circuits. Overall, our results demonstrate how the close integration of quantum and classical resources can push the boundary of what can be studied on quantum computers. △ Less

Submitted 28 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

Comments: 15 pages, 6 figures

arXiv:2503.07920 [pdf, other]

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Authors: Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, Vicky Feliren, Bahrul Ilmi Nasution, Manuel Antonio Rufino, Genta Indra Winata, Rian Adam Rajagede, Carlos Rafael Catalan, Mohamed Fazli Imam, Priyaranjan Pattnayak, Salsabila Zahirah Pranida, Kevin Pratama, Yeshil Bangera, Adisai Na-Thalang , et al. (67 additional authors not shown)

Abstract: Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA… ▽ More Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA languages. By involving contributors from SEA countries, SEA-VL aims to ensure better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages in VL research. Beyond crowdsourcing, our initiative goes one step further in the exploration of the automatic collection of culturally relevant images through crawling and image generation. First, we find that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing. Second, despite the substantial progress in generative vision models, synthetic images remain unreliable in accurately reflecting SEA cultures. The generated images often fail to reflect the nuanced traditions and cultural contexts of the region. Collectively, we gather 1.28M SEA culturally-relevant images, more than 50 times larger than other existing datasets. Through SEA-VL, we aim to bridge the representation gap in SEA, fostering the development of more inclusive AI systems that authentically represent diverse cultures across SEA. △ Less

Submitted 18 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

Comments: [SEA-VL Dataset] https://huggingface.co/collections/SEACrowd/sea-vl-multicultural-vl-dataset-for-southeast-asia-67cf223d0c341d4ba2b236e7 [Appendix J] https://github.com/SEACrowd/seacrowd.github.io/blob/master/docs/SEA_VL_Appendix_J.pdf

arXiv:2503.07206 [pdf, other]

Layered Topological Antiferromagnetic Metal at Room Temperature -- YbMn$_2$Ge$_2$

Authors: Nirmalya Jana, Atasi Chakraborty, Anamitra Mukherjee, Amit Agarwal

Abstract: Metallic antiferromagnets are essential for efficient spintronic applications due to their fast switching and high mobility, yet room-temperature metallic antiferromagnets are rare. Here, we investigate YbMn$_2$Ge$_2$, a room temperature antiferromagnet, and establish it as an exfoliable layered metal with altermagnetic surface states. Using multi-orbital Hubbard model calculations, we reveal that… ▽ More Metallic antiferromagnets are essential for efficient spintronic applications due to their fast switching and high mobility, yet room-temperature metallic antiferromagnets are rare. Here, we investigate YbMn$_2$Ge$_2$, a room temperature antiferromagnet, and establish it as an exfoliable layered metal with altermagnetic surface states. Using multi-orbital Hubbard model calculations, we reveal that its robust metallic AFM ordering is stabilized by electronic correlations and a partially nested Fermi surface. Furthermore, we show that YbMn$_2$Ge$_2$ hosts symmetry-protected topological Dirac crossings, connecting unique even-order spin-polarized surface states with parabolic and inverted Mexican-hat-like dispersion. Our findings position YbMn$_2$Ge$_2$ as a promising platform for exploring the interplay of correlation, topology, and surface altermagnetism of layered antiferromagnets. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 9 pages and 4 figures

arXiv:2503.06810 [pdf, other]

Mitigating Preference Hacking in Policy Optimization with Pessimism

Authors: Dhawal Gupta, Adam Fisch, Christoph Dann, Alekh Agarwal

Abstract: This work tackles the problem of overoptimization in reinforcement learning from human feedback (RLHF), a prevalent technique for aligning models with human preferences. RLHF relies on reward or preference models trained on \emph{fixed preference datasets}, and these models are unreliable when evaluated outside the support of this preference data, leading to the common reward or preference hacking… ▽ More This work tackles the problem of overoptimization in reinforcement learning from human feedback (RLHF), a prevalent technique for aligning models with human preferences. RLHF relies on reward or preference models trained on \emph{fixed preference datasets}, and these models are unreliable when evaluated outside the support of this preference data, leading to the common reward or preference hacking phenomenon. We propose novel, pessimistic objectives for RLHF which are provably robust to overoptimization through the use of pessimism in the face of uncertainty, and design practical algorithms, P3O and PRPO, to optimize these objectives. Our approach is derived for the general preference optimization setting, but can be used with reward models as well. We evaluate P3O and PRPO on the tasks of fine-tuning language models for document summarization and creating helpful assistants, demonstrating remarkable resilience to overoptimization. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Showing 1–50 of 799 results for author: Agarwaal, A