Search | arXiv e-print repository

Urban RIS-Assisted HAP Networks: Performance Analysis Using Stochastic Geometry

Authors: Islam M. Tanash, Ayush Kumar Dwivedi, Taneli Riihonen

Abstract: This paper studies a high-altitude platform (HAP) network supported by reconfigurable intelligent surfaces (RISs). The practical irregular placement of HAPs and RISs is modeled using homogeneous Poisson point processes, while buildings that cause blockages in urban areas are modeled as a Boolean scheme of rectangles. We introduce a novel approach to characterize the statistical channel based on ge… ▽ More This paper studies a high-altitude platform (HAP) network supported by reconfigurable intelligent surfaces (RISs). The practical irregular placement of HAPs and RISs is modeled using homogeneous Poisson point processes, while buildings that cause blockages in urban areas are modeled as a Boolean scheme of rectangles. We introduce a novel approach to characterize the statistical channel based on generalized Beta prime distribution. Analytical expressions for coverage probability and ergodic capacity in an interference-limited system are derived and validated through Monte Carlo simulations. The findings show notable performance improvements and reveal the impact of various system parameters, including blockages effect which contribute in mitigating interference from the other visible HAPs. This proposed system could enhance connectivity and enable effective data offloading in urban environments. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2506.14978 [pdf, ps, other]

ODD: Overlap-aware Estimation of Model Performance under Distribution Shift

Authors: Aayush Mishra, Anqi Liu

Abstract: Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a… ▽ More Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a reliable and competitively accurate estimate of the target error, we identify a problem in this approach which causes the disagreement discrepancy objective to compete in the overlapping region between source and target domains. With an intuitive assumption that the target disagreement should be no more than the source disagreement in the overlapping region due to high enough support, we devise Overlap-aware Disagreement Discrepancy (ODD). Maximizing ODD only requires disagreement in the non-overlapping target domain, removing the competition. Our ODD-based bound uses domain-classifiers to estimate domain-overlap and better predicts target performance than DIS^2. We conduct experiments on a wide array of benchmarks to show that our method improves the overall performance-estimation error while remaining valid and reliable. Our code and results are available on GitHub. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Accepted to the 41st Conference on Uncertainty in Artificial Intelligence, 2025

arXiv:2506.13048 [pdf, ps, other]

The Space Complexity of Learning-Unlearning Algorithms

Authors: Yeshwanth Cherapanamjeri, Sumegha Garg, Nived Rajaraman, Ayush Sekhari, Abhishek Shetty

Abstract: We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner n… ▽ More We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner never received the data of deleted users. In this paper, we ask how many bits of storage are needed to be able to delete certain training samples at a later time. We focus on the task of realizability testing, where the goal is to check whether the remaining training samples are realizable within a given hypothesis class $\mathcal{H}$. Toward that end, we first provide a negative result showing that the VC dimension is not a characterization of the space complexity of unlearning. In particular, we provide a hypothesis class with constant VC dimension (and Littlestone dimension), but for which any unlearning algorithm for realizability testing needs to store $Ω(n)$-bits, where $n$ denotes the size of the initial training dataset. In fact, we provide a stronger separation by showing that for any hypothesis class $\mathcal{H}$, the amount of information that the learner needs to store, so as to perform unlearning later, is lower bounded by the \textit{eluder dimension} of $\mathcal{H}$, a combinatorial notion always larger than the VC dimension. We complement the lower bound with an upper bound in terms of the star number of the underlying hypothesis class, albeit in a stronger ticketed-memory model proposed by Ghazi et al. (2023). Since the star number for a hypothesis class is never larger than its Eluder dimension, our work highlights a fundamental separation between central and ticketed memory models for machine unlearning. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.12347 [pdf, ps, other]

Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks

Authors: Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Gustavo Soares, Emerson Murphy-Hill

Abstract: Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication cha… ▽ More Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication challenges that arise in such interactions, we observed 19 developers using an in-IDE agent to resolve 33 open issues in repositories to which they had previously contributed. Participants successfully resolved about half of these issues, with participants solving issues incrementally having greater success than those using a one-shot approach. Participants who actively collaborated with the agent and iterated on its outputs were also more successful, though they faced challenges in trusting the agent's responses and collaborating on debugging and testing. These results have implications for successful developer-agent collaborations, and for the design of more effective SWE agents. △ Less

Submitted 17 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.12103 [pdf, other]

The Amazon Nova Family of Models: Technical Report and Model Card

Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation. △ Less

Submitted 17 March, 2025; originally announced June 2025.

Comments: 48 pages, 10 figures

Report number: 20250317

arXiv:2506.12097 [pdf, ps, other]

UCD: Unlearning in LLMs via Contrastive Decoding

Authors: Vinith M. Suriyakumar, Ayush Sekhari, Ashia Wilson

Abstract: Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using t… ▽ More Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using their difference during inference. Our strategy substantially improves the tradeoff between unlearning effectiveness and model utility. We evaluate our approach on two unlearning benchmarks, TOFU and MUSE. Results show notable gains in both forget quality and retained performance in comparison to prior approaches, suggesting that incorporating contrastive decoding can offer an efficient, practical avenue for unlearning concepts in large-scale models. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.12003 [pdf]

Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang

Abstract: The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed curren… ▽ More The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed current DNS/PKI capabilities. This paper analyzes whether to upgrade existing infrastructure or implement purpose-built registry architectures for autonomous agents. We identify critical failure points: DNS propagation (24-48 hours vs. required milliseconds), certificate revocation unable to scale to trillions of entities, and IPv4/IPv6 addressing inadequate for agent-scale routing. We evaluate three approaches: (1) Upgrade paths, (2) Switch options, (3) Hybrid registries. Drawing parallels to dialup-to-broadband transitions, we find that agent requirements constitute qualitative, and not incremental, changes. While upgrades offer compatibility and faster deployment, clean-slate solutions provide better performance but require longer for adoption. Our analysis suggests hybrid approaches will emerge, with centralized registries for critical agents and federated meshes for specialized use cases. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11302 [pdf, ps, other]

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

Authors: Héctor Carrión, Yutong Bai, Víctor A. Hernández Castro, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik

Abstract: World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observatio… ▽ More World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observation, state and action nodes. Leveraging this structure, we can simultaneously model the relationship between egocentric views, positional coordinates, and movement commands across both space and time. We benchmark this dataset via TARDIS, a transformer-based generative world model that integrates spatial and temporal dynamics through a unified autoregressive framework trained on STRIDE. We demonstrate robust performance across a range of agentic tasks such as controllable photorealistic image synthesis, instruction following, autonomous self-control, and state-of-the-art georeferencing. These results suggest a promising direction towards sophisticated generalist agents--capable of understanding and manipulating the spatial and temporal aspects of their material environments--with enhanced embodied reasoning capabilities. Training code, datasets, and model checkpoints are made available at https://huggingface.co/datasets/Tera-AI/STRIDE. △ Less

Submitted 18 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

Comments: Computer Vision, Pattern Recognition, Early-Fusion, Dataset, Data Augmentation

arXiv:2506.10961 [pdf, ps, other]

Discovery and Localization of the Swift-Observed FRB 20241228A in a Star-forming Host Galaxy

Authors: Alice P. Curtin, Shion Andrew, Sunil Simha, Alice Cai, Kenzie Nimmo, Shami Chatterjee, Amanda M. Cook, Fengqiu Adam Dong, Yuxin Dong, Tarraneh Eftekhari, Wen-fai Fong, Emmanuel Fonseca, Jason W. T. Hessels, Ronniy C. Joseph, Victoria Kaspi, Calvin Leung, Robert Main, Kiyoshi W. Masui, Ryan Mckinven, Daniele Michilli, Mason Ng, Ayush Pandhi, Aaron B. Pearlman, Ziggy Pleunis, Mawson W. Sammons , et al. (5 additional authors not shown)

Abstract: On 2024 December 28, CHIME/FRB detected the thus-far non-repeating FRB 20241228A with a real-time signal-to-noise ratio of $>50$. Approximately 112~s later, the X-ray Telescope onboard the Neil Gehrels Swift Observatory was on source, the fastest follow-up to-date of a non-repeating FRB (Tohuvavohu et al. in prep.). Using CHIME/FRB and two of the three CHIME/FRB Outriggers, we obtained a Very Long… ▽ More On 2024 December 28, CHIME/FRB detected the thus-far non-repeating FRB 20241228A with a real-time signal-to-noise ratio of $>50$. Approximately 112~s later, the X-ray Telescope onboard the Neil Gehrels Swift Observatory was on source, the fastest follow-up to-date of a non-repeating FRB (Tohuvavohu et al. in prep.). Using CHIME/FRB and two of the three CHIME/FRB Outriggers, we obtained a Very Long Baseline Interferometry localization for FRB 20241228A with a 1$σ$ confidence ellipse of 11$^{\prime\prime}$ by 0.2$^{\prime\prime}$. This represents the first published localization using both the CHIME-KKO and CHIME-GBO Outriggers. We associate FRB 20241228A with a star-forming galaxy at a redshift of $z = 0.1614\pm0.0002$. The persistent X-ray luminosity limit at this source's location and distance is $<1.2 \times 10^{43}$ erg s$^{-1}$ in the $0.3-10$ keV band, the most stringent limit of any non-repeating FRB to-date (Tohuvavohu et al. in prep.). The stellar mass ($\sim 2.6 \times 10^{10}\,M_{\odot}$) and star formation rate ($\sim 2.9\,M_{\odot}$~yr$^{-1}$) of the host galaxy of FRB 20241228A are consistent with the broader FRB host galaxy population. We measure significant scattering ($\sim$1ms) and scintillation ($\sim$20 kHz at 600 MHz) along the line of sight to this source, and suggest the scintillation screen is Galactic while the scattering screen is extragalactic. FRB 20241228A represents an exciting example of a new era in which we can harness VLBI-localizations and rapid high-energy follow-up to probe FRB progenitors. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: Submitted to ApJ

arXiv:2506.10955 [pdf, ps, other]

ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems

Authors: Aayush Karan, Kulin Shah, Sitan Chen

Abstract: There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in… ▽ More There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in hard inverse problems with low signal-to-noise ratio, these techniques veer off the data manifold, failing to produce realistic outputs. In this work, we devise a simple wrapper, ReGuidance, for boosting both the sample realism and reward achieved by these methods. Given a candidate solution $\hat{x}$ produced by an algorithm of the user's choice, we propose inverting the solution by running the unconditional probability flow ODE in reverse starting from $\hat{x}$, and then using the resulting latent as an initialization for DPS. We evaluate our wrapper on hard inverse problems like large box in-painting and super-resolution with high upscaling. Whereas state-of-the-art baselines visibly fail, we find that applying our wrapper on top of these baselines significantly boosts sample quality and measurement consistency. We complement these findings with theory proving that on certain multimodal data distributions, ReGuidance simultaneously boosts the reward and brings the candidate solution closer to the data manifold. To our knowledge, this constitutes the first rigorous algorithmic guarantee for DPS. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 38 pages, 14 figures

arXiv:2506.09445 [pdf, ps, other]

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

Authors: Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha

Abstract: We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to j… ▽ More We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2506.09108 [pdf, ps, other]

SensorLM: Learning the Language of Wearable Sensors

Authors: Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

Abstract: We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel… ▽ More We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people. Furthermore, SensorLM extends prominent multimodal pretraining architectures (e.g., CLIP, CoCa) and recovers them as specific variants within a generic architecture. Extensive experiments on real-world tasks in human activity analysis and healthcare verify the superior performance of SensorLM over state-of-the-art in zero-shot recognition, few-shot learning, and cross-modal retrieval. SensorLM also demonstrates intriguing capabilities including scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to unseen tasks. △ Less

Submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.08376 [pdf, other]

Revealing Dark Matter's Role in Neutron Stars Anisotropy: A Bayesian Approach Using Multi-messenger Observations

Authors: Xue-Zhi Liu, Premachand Mahapatra, Chun Huang, Ayush Hazarika, Chiranjeeb Singha, Prasanta Kumar Das

Abstract: Dark matter (DM) continues to evade direct detection, but neutron stars (NSs) serve as natural laboratories where even a modest DM component can alter their structure. While many studies have examined DM effects on NSs, they often rely on specific choices of equations of state (EOS) models, assume isotropy, and lack a Bayesian statistical framework, limiting their predictive power. In this work, w… ▽ More Dark matter (DM) continues to evade direct detection, but neutron stars (NSs) serve as natural laboratories where even a modest DM component can alter their structure. While many studies have examined DM effects on NSs, they often rely on specific choices of equations of state (EOS) models, assume isotropy, and lack a Bayesian statistical framework, limiting their predictive power. In this work, we present a Bayesian framework that couples pressure-anisotropic nuclear EOS to a self-interacting fermionic DM component, constrained by NICER and GW170817 data. Our results show that DM mass fractions up to $\sim10\%$ remain consistent with current data, which softens the high-density EOS, leading to reduced stellar radii and tidal deformabilities while requiring negligible pressure anisotropy. Bayesian model comparison reveals no statistically significant preference between pure baryonic and DM-admixed NSs, indicating that DM inclusion enhances physical realism without complexity penalties. However, existing data cannot tightly constrain the DM parameters, and our empirical radius definition introduces a systematic bias toward the DM core configurations. To address this, we therefore introduce the DM radius span $ΔR_χ\equiv R_{χ,\mathrm{max}} - R_{χ,\mathrm{min}}$ as a unified diagnostic for DM distributions. This parameter simultaneously characterizes core-halo transition features while exhibiting strong linear correlations ($ΔR_χ< 4\,\mathrm{km}$) with both DM and BM parameters, providing a clear avenue for future constraints. Our approach bridges current limitations and future potential in probing DM through compact star observations. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: 24 pages, 13 figures. Submitting to PRD. Comments welcome

arXiv:2506.08249 [pdf, other]

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Authors: Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro… ▽ More Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning. △ Less

Submitted 9 June, 2025; originally announced June 2025.

arXiv:2506.07259 [pdf, ps, other]

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Authors: Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, Luigi Acerbi

Abstract: Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new… ▽ More Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new data needs to be collected for instant inference. To tackle this issue, we introduce the Amortized Active Learning and Inference Engine (ALINE), a unified framework for amortized Bayesian inference and active data acquisition. ALINE leverages a transformer architecture trained via reinforcement learning with a reward based on self-estimated information gain provided by its own integrated inference component. This allows it to strategically query informative data points while simultaneously refining its predictions. Moreover, ALINE can selectively direct its querying strategy towards specific subsets of model parameters or designated predictive tasks, optimizing for posterior estimation, data prediction, or a mixture thereof. Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points. △ Less

Submitted 8 June, 2025; originally announced June 2025.

Comments: 27 pages, 13 figures

arXiv:2506.06087 [pdf, ps, other]

Multilevel neural simulation-based inference

Authors: Yuga Hikida, Ayush Bharti, Niall Jeffrey, François-Xavier Briol

Abstract: Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally… ▽ More Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget. △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2506.06073 [pdf, ps, other]

System-Aware Unlearning Algorithms: Use Lesser, Forget Faster

Authors: Linda Lu, Ayush Sekhari, Karthik Sridharan

Abstract: Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a wor… ▽ More Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a worst-case attacker (one who can recover not only the unlearned model but also the remaining data samples, i.e., $S \setminus U$). Such a stringent definition has made developing efficient unlearning algorithms challenging. However, such strong attackers are also unrealistic. In this work, we propose a new definition, system-aware unlearning, which aims to provide unlearning guarantees against an attacker that can at best only gain access to the data stored in the system for learning/unlearning requests and not all of $S\setminus U$. With this new definition, we use the simple intuition that if a system can store less to make its learning/unlearning updates, it can be more secure and update more efficiently against a system-aware attacker. Towards that end, we present an exact system-aware unlearning algorithm for linear classification using a selective sampling-based approach, and we generalize the method for classification with general function classes. We theoretically analyze the tradeoffs between deletion capacity, accuracy, memory, and computation time. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: ICML 2025

arXiv:2506.05707 [pdf, ps, other]

A cautious user's guide in applying HMMs to physical systems

Authors: Max Schweiger, Ayush Saurabh, Steve Pressé

Abstract: Nature, as far as we know, evolves continuously through space and time. Yet the ubiquitous hidden Markov model (HMM)--originally developed for discrete time and space analysis in natural language processing--remains a central tool in interpreting time series data drawn from from physical systems. This raises a fundamental question: What are the implications of applying a discrete-state, discrete-t… ▽ More Nature, as far as we know, evolves continuously through space and time. Yet the ubiquitous hidden Markov model (HMM)--originally developed for discrete time and space analysis in natural language processing--remains a central tool in interpreting time series data drawn from from physical systems. This raises a fundamental question: What are the implications of applying a discrete-state, discrete-time framework to analyze data generated by a continuously evolving system? Through synthetic data generated using Langevin dynamics in an effective potential, we explore under what circumstances HMMs yield interpretable results. Our analysis reveals that the discrete-state approximation acts primarily as an abstraction with the inferred states visited in time often more closely reflecting the measurement protocol and modeling choices than features of the underlying physical potential. Crucially, we demonstrate that the states visited over the course of a time series recovered by the HMM can be tuned a priori by adjusting the data acquisition scheme even misleadingly recovering reproducible "intermediate" states using different HMM tools for a system evolving in a single well potential. We conclude with a note of measured caution: while HMMs offer a mathematically elegant framework for time series inference, their use in physical modeling should be guided by an awareness of their limitations. In this light, we outline important generalizations of the HMM to continuous space and time and highlight the importance of a well calibrated measurement noise model. △ Less