-
Urban RIS-Assisted HAP Networks: Performance Analysis Using Stochastic Geometry
Authors:
Islam M. Tanash,
Ayush Kumar Dwivedi,
Taneli Riihonen
Abstract:
This paper studies a high-altitude platform (HAP) network supported by reconfigurable intelligent surfaces (RISs). The practical irregular placement of HAPs and RISs is modeled using homogeneous Poisson point processes, while buildings that cause blockages in urban areas are modeled as a Boolean scheme of rectangles. We introduce a novel approach to characterize the statistical channel based on ge…
▽ More
This paper studies a high-altitude platform (HAP) network supported by reconfigurable intelligent surfaces (RISs). The practical irregular placement of HAPs and RISs is modeled using homogeneous Poisson point processes, while buildings that cause blockages in urban areas are modeled as a Boolean scheme of rectangles. We introduce a novel approach to characterize the statistical channel based on generalized Beta prime distribution. Analytical expressions for coverage probability and ergodic capacity in an interference-limited system are derived and validated through Monte Carlo simulations. The findings show notable performance improvements and reveal the impact of various system parameters, including blockages effect which contribute in mitigating interference from the other visible HAPs. This proposed system could enhance connectivity and enable effective data offloading in urban environments.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
ODD: Overlap-aware Estimation of Model Performance under Distribution Shift
Authors:
Aayush Mishra,
Anqi Liu
Abstract:
Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a…
▽ More
Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a reliable and competitively accurate estimate of the target error, we identify a problem in this approach which causes the disagreement discrepancy objective to compete in the overlapping region between source and target domains. With an intuitive assumption that the target disagreement should be no more than the source disagreement in the overlapping region due to high enough support, we devise Overlap-aware Disagreement Discrepancy (ODD). Maximizing ODD only requires disagreement in the non-overlapping target domain, removing the competition. Our ODD-based bound uses domain-classifiers to estimate domain-overlap and better predicts target performance than DIS^2. We conduct experiments on a wide array of benchmarks to show that our method improves the overall performance-estimation error while remaining valid and reliable. Our code and results are available on GitHub.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
The Space Complexity of Learning-Unlearning Algorithms
Authors:
Yeshwanth Cherapanamjeri,
Sumegha Garg,
Nived Rajaraman,
Ayush Sekhari,
Abhishek Shetty
Abstract:
We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner n…
▽ More
We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner never received the data of deleted users. In this paper, we ask how many bits of storage are needed to be able to delete certain training samples at a later time. We focus on the task of realizability testing, where the goal is to check whether the remaining training samples are realizable within a given hypothesis class \(\mathcal{H}\).
Toward that end, we first provide a negative result showing that the VC dimension is not a characterization of the space complexity of unlearning. In particular, we provide a hypothesis class with constant VC dimension (and Littlestone dimension), but for which any unlearning algorithm for realizability testing needs to store \(Ω(n)\)-bits, where \(n\) denotes the size of the initial training dataset. In fact, we provide a stronger separation by showing that for any hypothesis class \(\mathcal{H}\), the amount of information that the learner needs to store, so as to perform unlearning later, is lower bounded by the \textit{eluder dimension} of \(\mathcal{H}\), a combinatorial notion always larger than the VC dimension. We complement the lower bound with an upper bound in terms of the star number of the underlying hypothesis class, albeit in a stronger ticketed-memory model proposed by Ghazi et al. (2023). Since the star number for a hypothesis class is never larger than its Eluder dimension, our work highlights a fundamental separation between central and ticketed memory models for machine unlearning.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks
Authors:
Aayush Kumar,
Yasharth Bajpai,
Sumit Gulwani,
Gustavo Soares,
Emerson Murphy-Hill
Abstract:
Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication cha…
▽ More
Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication challenges that arise in such interactions, we observed 19 developers using an in-IDE agent to resolve 33 open issues in repositories to which they had previously contributed. Participants successfully resolved about half of these issues, with participants solving issues incrementally having greater success than those using a one-shot approach. Participants who actively collaborated with the agent and iterated on its outputs were also more successful, though they faced challenges in trusting the agent's responses and collaborating on debugging and testing. These results have implications for successful developer-agent collaborations, and for the design of more effective SWE agents.
△ Less
Submitted 17 June, 2025; v1 submitted 14 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
UCD: Unlearning in LLMs via Contrastive Decoding
Authors:
Vinith M. Suriyakumar,
Ayush Sekhari,
Ashia Wilson
Abstract:
Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using t…
▽ More
Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using their difference during inference. Our strategy substantially improves the tradeoff between unlearning effectiveness and model utility. We evaluate our approach on two unlearning benchmarks, TOFU and MUSE. Results show notable gains in both forget quality and retained performance in comparison to prior approaches, suggesting that incorporating contrastive decoding can offer an efficient, practical avenue for unlearning concepts in large-scale models.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?
Authors:
Ramesh Raskar,
Pradyumna Chari,
Jared James Grogan,
Mahesh Lambe,
Robert Lincourt,
Raghu Bala,
Abhishek Singh,
Ayush Chopra,
Rajesh Ranjan,
Shailja Gupta,
Dimitris Stripelis,
Maria Gorskikh,
Sichao Wang
Abstract:
The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed curren…
▽ More
The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed current DNS/PKI capabilities. This paper analyzes whether to upgrade existing infrastructure or implement purpose-built registry architectures for autonomous agents. We identify critical failure points: DNS propagation (24-48 hours vs. required milliseconds), certificate revocation unable to scale to trillions of entities, and IPv4/IPv6 addressing inadequate for agent-scale routing. We evaluate three approaches: (1) Upgrade paths, (2) Switch options, (3) Hybrid registries. Drawing parallels to dialup-to-broadband transitions, we find that agent requirements constitute qualitative, and not incremental, changes. While upgrades offer compatibility and faster deployment, clean-slate solutions provide better performance but require longer for adoption. Our analysis suggests hybrid approaches will emerge, with centralized registries for critical agents and federated meshes for specialized use cases.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy
Authors:
Héctor Carrión,
Yutong Bai,
Víctor A. Hernández Castro,
Kishan Panaganti,
Ayush Zenith,
Matthew Trang,
Tony Zhang,
Pietro Perona,
Jitendra Malik
Abstract:
World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observatio…
▽ More
World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observation, state and action nodes. Leveraging this structure, we can simultaneously model the relationship between egocentric views, positional coordinates, and movement commands across both space and time. We benchmark this dataset via TARDIS, a transformer-based generative world model that integrates spatial and temporal dynamics through a unified autoregressive framework trained on STRIDE. We demonstrate robust performance across a range of agentic tasks such as controllable photorealistic image synthesis, instruction following, autonomous self-control, and state-of-the-art georeferencing. These results suggest a promising direction towards sophisticated generalist agents--capable of understanding and manipulating the spatial and temporal aspects of their material environments--with enhanced embodied reasoning capabilities. Training code, datasets, and model checkpoints are made available at https://huggingface.co/datasets/Tera-AI/STRIDE.
△ Less
Submitted 18 June, 2025; v1 submitted 12 June, 2025;
originally announced June 2025.
-
Discovery and Localization of the Swift-Observed FRB 20241228A in a Star-forming Host Galaxy
Authors:
Alice P. Curtin,
Shion Andrew,
Sunil Simha,
Alice Cai,
Kenzie Nimmo,
Shami Chatterjee,
Amanda M. Cook,
Fengqiu Adam Dong,
Yuxin Dong,
Tarraneh Eftekhari,
Wen-fai Fong,
Emmanuel Fonseca,
Jason W. T. Hessels,
Ronniy C. Joseph,
Victoria Kaspi,
Calvin Leung,
Robert Main,
Kiyoshi W. Masui,
Ryan Mckinven,
Daniele Michilli,
Mason Ng,
Ayush Pandhi,
Aaron B. Pearlman,
Ziggy Pleunis,
Mawson W. Sammons
, et al. (5 additional authors not shown)
Abstract:
On 2024 December 28, CHIME/FRB detected the thus-far non-repeating FRB 20241228A with a real-time signal-to-noise ratio of $>50$. Approximately 112~s later, the X-ray Telescope onboard the Neil Gehrels Swift Observatory was on source, the fastest follow-up to-date of a non-repeating FRB (Tohuvavohu et al. in prep.). Using CHIME/FRB and two of the three CHIME/FRB Outriggers, we obtained a Very Long…
▽ More
On 2024 December 28, CHIME/FRB detected the thus-far non-repeating FRB 20241228A with a real-time signal-to-noise ratio of $>50$. Approximately 112~s later, the X-ray Telescope onboard the Neil Gehrels Swift Observatory was on source, the fastest follow-up to-date of a non-repeating FRB (Tohuvavohu et al. in prep.). Using CHIME/FRB and two of the three CHIME/FRB Outriggers, we obtained a Very Long Baseline Interferometry localization for FRB 20241228A with a 1$σ$ confidence ellipse of 11$^{\prime\prime}$ by 0.2$^{\prime\prime}$. This represents the first published localization using both the CHIME-KKO and CHIME-GBO Outriggers. We associate FRB 20241228A with a star-forming galaxy at a redshift of $z = 0.1614\pm0.0002$. The persistent X-ray luminosity limit at this source's location and distance is $<1.2 \times 10^{43}$ erg s$^{-1}$ in the $0.3-10$ keV band, the most stringent limit of any non-repeating FRB to-date (Tohuvavohu et al. in prep.). The stellar mass ($\sim 2.6 \times 10^{10}\,M_{\odot}$) and star formation rate ($\sim 2.9\,M_{\odot}$~yr$^{-1}$) of the host galaxy of FRB 20241228A are consistent with the broader FRB host galaxy population. We measure significant scattering ($\sim$1ms) and scintillation ($\sim$20 kHz at 600 MHz) along the line of sight to this source, and suggest the scintillation screen is Galactic while the scattering screen is extragalactic. FRB 20241228A represents an exciting example of a new era in which we can harness VLBI-localizations and rapid high-energy follow-up to probe FRB progenitors.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems
Authors:
Aayush Karan,
Kulin Shah,
Sitan Chen
Abstract:
There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in…
▽ More
There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in hard inverse problems with low signal-to-noise ratio, these techniques veer off the data manifold, failing to produce realistic outputs. In this work, we devise a simple wrapper, ReGuidance, for boosting both the sample realism and reward achieved by these methods. Given a candidate solution $\hat{x}$ produced by an algorithm of the user's choice, we propose inverting the solution by running the unconditional probability flow ODE in reverse starting from $\hat{x}$, and then using the resulting latent as an initialization for DPS. We evaluate our wrapper on hard inverse problems like large box in-painting and super-resolution with high upscaling. Whereas state-of-the-art baselines visibly fail, we find that applying our wrapper on top of these baselines significantly boosts sample quality and measurement consistency. We complement these findings with theory proving that on certain multimodal data distributions, ReGuidance simultaneously boosts the reward and brings the candidate solution closer to the data manifold. To our knowledge, this constitutes the first rigorous algorithmic guarantee for DPS.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
Authors:
Ayush Gupta,
Anirban Roy,
Rama Chellappa,
Nathaniel D. Bastian,
Alvaro Velasquez,
Susmit Jha
Abstract:
We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to j…
▽ More
We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
SensorLM: Learning the Language of Wearable Sensors
Authors:
Yuwei Zhang,
Kumar Ayush,
Siyuan Qiao,
A. Ali Heydari,
Girish Narayanswamy,
Maxwell A. Xu,
Ahmed A. Metwally,
Shawn Xu,
Jake Garrison,
Xuhai Xu,
Tim Althoff,
Yun Liu,
Pushmeet Kohli,
Jiening Zhan,
Mark Malhotra,
Shwetak Patel,
Cecilia Mascolo,
Xin Liu,
Daniel McDuff,
Yuzhe Yang
Abstract:
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel…
▽ More
We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipeline designed to capture statistical, structural, and semantic information from sensor data. This approach enabled the curation of the largest sensor-language dataset to date, comprising over 59.7 million hours of data from more than 103,000 people. Furthermore, SensorLM extends prominent multimodal pretraining architectures (e.g., CLIP, CoCa) and recovers them as specific variants within a generic architecture. Extensive experiments on real-world tasks in human activity analysis and healthcare verify the superior performance of SensorLM over state-of-the-art in zero-shot recognition, few-shot learning, and cross-modal retrieval. SensorLM also demonstrates intriguing capabilities including scaling behaviors, label efficiency, sensor captioning, and zero-shot generalization to unseen tasks.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Revealing Dark Matter's Role in Neutron Stars Anisotropy: A Bayesian Approach Using Multi-messenger Observations
Authors:
Xue-Zhi Liu,
Premachand Mahapatra,
Chun Huang,
Ayush Hazarika,
Chiranjeeb Singha,
Prasanta Kumar Das
Abstract:
Dark matter (DM) continues to evade direct detection, but neutron stars (NSs) serve as natural laboratories where even a modest DM component can alter their structure. While many studies have examined DM effects on NSs, they often rely on specific choices of equations of state (EOS) models, assume isotropy, and lack a Bayesian statistical framework, limiting their predictive power. In this work, w…
▽ More
Dark matter (DM) continues to evade direct detection, but neutron stars (NSs) serve as natural laboratories where even a modest DM component can alter their structure. While many studies have examined DM effects on NSs, they often rely on specific choices of equations of state (EOS) models, assume isotropy, and lack a Bayesian statistical framework, limiting their predictive power. In this work, we present a Bayesian framework that couples pressure-anisotropic nuclear EOS to a self-interacting fermionic DM component, constrained by NICER and GW170817 data. Our results show that DM mass fractions up to $\sim10\%$ remain consistent with current data, which softens the high-density EOS, leading to reduced stellar radii and tidal deformabilities while requiring negligible pressure anisotropy. Bayesian model comparison reveals no statistically significant preference between pure baryonic and DM-admixed NSs, indicating that DM inclusion enhances physical realism without complexity penalties. However, existing data cannot tightly constrain the DM parameters, and our empirical radius definition introduces a systematic bias toward the DM core configurations. To address this, we therefore introduce the DM radius span $ΔR_χ\equiv R_{χ,\mathrm{max}} - R_{χ,\mathrm{min}}$ as a unified diagnostic for DM distributions. This parameter simultaneously characterizes core-halo transition features while exhibiting strong linear correlations ($ΔR_χ< 4\,\mathrm{km}$) with both DM and BM parameters, providing a clear avenue for future constraints. Our approach bridges current limitations and future potential in probing DM through compact star observations.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
RADAR: Benchmarking Language Models on Imperfect Tabular Data
Authors:
Ken Gu,
Zhihan Zhang,
Kate Lin,
Yuwei Zhang,
Akshay Paruchuri,
Hong Yu,
Mehran Kazemi,
Kumar Ayush,
A. Ali Heydari,
Maxwell A. Xu,
Girish Narayanswamy,
Yun Liu,
Ming-Zher Poh,
Yuzhe Yang,
Mark Malhotra,
Shwetak Patel,
Hamid Palangi,
Xuhai Xu,
Daniel McDuff,
Tim Althoff,
Xin Liu
Abstract:
Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro…
▽ More
Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2980 table query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Authors:
Daolang Huang,
Xinyi Wen,
Ayush Bharti,
Samuel Kaski,
Luigi Acerbi
Abstract:
Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new…
▽ More
Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new data needs to be collected for instant inference. To tackle this issue, we introduce the Amortized Active Learning and Inference Engine (ALINE), a unified framework for amortized Bayesian inference and active data acquisition. ALINE leverages a transformer architecture trained via reinforcement learning with a reward based on self-estimated information gain provided by its own integrated inference component. This allows it to strategically query informative data points while simultaneously refining its predictions. Moreover, ALINE can selectively direct its querying strategy towards specific subsets of model parameters or designated predictive tasks, optimizing for posterior estimation, data prediction, or a mixture thereof. Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Multilevel neural simulation-based inference
Authors:
Yuga Hikida,
Ayush Bharti,
Niall Jeffrey,
François-Xavier Briol
Abstract:
Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally…
▽ More
Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
System-Aware Unlearning Algorithms: Use Lesser, Forget Faster
Authors:
Linda Lu,
Ayush Sekhari,
Karthik Sridharan
Abstract:
Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a wor…
▽ More
Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a worst-case attacker (one who can recover not only the unlearned model but also the remaining data samples, i.e., $S \setminus U$). Such a stringent definition has made developing efficient unlearning algorithms challenging. However, such strong attackers are also unrealistic. In this work, we propose a new definition, system-aware unlearning, which aims to provide unlearning guarantees against an attacker that can at best only gain access to the data stored in the system for learning/unlearning requests and not all of $S\setminus U$. With this new definition, we use the simple intuition that if a system can store less to make its learning/unlearning updates, it can be more secure and update more efficiently against a system-aware attacker. Towards that end, we present an exact system-aware unlearning algorithm for linear classification using a selective sampling-based approach, and we generalize the method for classification with general function classes. We theoretically analyze the tradeoffs between deletion capacity, accuracy, memory, and computation time.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
A cautious user's guide in applying HMMs to physical systems
Authors:
Max Schweiger,
Ayush Saurabh,
Steve Pressé
Abstract:
Nature, as far as we know, evolves continuously through space and time. Yet the ubiquitous hidden Markov model (HMM)--originally developed for discrete time and space analysis in natural language processing--remains a central tool in interpreting time series data drawn from from physical systems. This raises a fundamental question: What are the implications of applying a discrete-state, discrete-t…
▽ More
Nature, as far as we know, evolves continuously through space and time. Yet the ubiquitous hidden Markov model (HMM)--originally developed for discrete time and space analysis in natural language processing--remains a central tool in interpreting time series data drawn from from physical systems. This raises a fundamental question: What are the implications of applying a discrete-state, discrete-time framework to analyze data generated by a continuously evolving system? Through synthetic data generated using Langevin dynamics in an effective potential, we explore under what circumstances HMMs yield interpretable results. Our analysis reveals that the discrete-state approximation acts primarily as an abstraction with the inferred states visited in time often more closely reflecting the measurement protocol and modeling choices than features of the underlying physical potential. Crucially, we demonstrate that the states visited over the course of a time series recovered by the HMM can be tuned a priori by adjusting the data acquisition scheme even misleadingly recovering reproducible "intermediate" states using different HMM tools for a system evolving in a single well potential. We conclude with a note of measured caution: while HMMs offer a mathematically elegant framework for time series inference, their use in physical modeling should be guided by an awareness of their limitations. In this light, we outline important generalizations of the HMM to continuous space and time and highlight the importance of a well calibrated measurement noise model.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment
Authors:
Priyanka Dey,
Yugal Khanter,
Aayush Bothra,
Jieyu Zhao,
Emilio Ferrara
Abstract:
As LLMs become central to interactive applications, ranging from tutoring to mental health, the ability to express personality in culturally appropriate ways is increasingly important. While recent works have explored personality evaluation of LLMs, they largely overlook the interplay between culture and personality. To address this, we introduce CulturalPersonas, the first large-scale benchmark w…
▽ More
As LLMs become central to interactive applications, ranging from tutoring to mental health, the ability to express personality in culturally appropriate ways is increasingly important. While recent works have explored personality evaluation of LLMs, they largely overlook the interplay between culture and personality. To address this, we introduce CulturalPersonas, the first large-scale benchmark with human validation for evaluating LLMs' personality expression in culturally grounded, behaviorally rich contexts. Our dataset spans 3,000 scenario-based questions across six diverse countries, designed to elicit personality through everyday scenarios rooted in local values. We evaluate three LLMs, using both multiple-choice and open-ended response formats. Our results show that CulturalPersonas improves alignment with country-specific human personality distributions (over a 20% reduction in Wasserstein distance across models and countries) and elicits more expressive, culturally coherent outputs compared to existing benchmarks. CulturalPersonas surfaces meaningful modulated trait outputs in response to culturally grounded prompts, offering new directions for aligning LLMs to global norms of behavior. By bridging personality expression and cultural nuance, we envision that CulturalPersonas will pave the way for more socially intelligent and globally adaptive LLMs.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Emergent Berezinskii-Kosterlitz-Thouless deconfinement in super-Coulombic plasmas
Authors:
Ayush De,
Leo Radzihovsky,
Snir Gazit
Abstract:
We study the statistical mechanics of two-dimensional "super-Coulombic" plasmas, namely, neutral plasmas with power-law interactions longer-ranged than Coulomb. To that end, we employ numerically exact large-scale Monte Carlo simulations. Contrary to naive energy-entropy arguments, we observe a charge confinement-deconfinement transition as a function of temperature. Remarkably, the transition lie…
▽ More
We study the statistical mechanics of two-dimensional "super-Coulombic" plasmas, namely, neutral plasmas with power-law interactions longer-ranged than Coulomb. To that end, we employ numerically exact large-scale Monte Carlo simulations. Contrary to naive energy-entropy arguments, we observe a charge confinement-deconfinement transition as a function of temperature. Remarkably, the transition lies in the Berezinskii-Kosterlitz-Thouless (BKT) universality class. Our results corroborate recent dielectric medium and renormalization group calculations predicting effective long-scale Coulomb interactions in microscopically super-Coulombic gases. We explicitly showcase this novel dielectric screening phenomenon, capturing the emergent Coulomb potential and the associated crossover length scale. This is achieved by utilizing a new test charge based methodology for determining effective inter-particle interactions. Lastly, we show that this Coulomb emergence and the associated BKT transition occur universally across generic interactions and densities.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
LSM-2: Learning from Incomplete Wearable Sensor Data
Authors:
Maxwell A. Xu,
Girish Narayanswamy,
Kumar Ayush,
Dimitris Spathis,
Shun Liao,
Shyam A. Tailor,
Ahmed Metwally,
A. Ali Heydari,
Yuwei Zhang,
Jake Garrison,
Samy Abdel-Ghaffar,
Xuhai Xu,
Ken Gu,
Jacob Sunshine,
Ming-Zher Poh,
Yun Liu,
Tim Althoff,
Shrikanth Narayanan,
Pushmeet Kohli,
Mark Malhotra,
Shwetak Patel,
Yuzhe Yang,
James M. Rehg,
Xin Liu,
Daniel McDuff
Abstract:
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-…
▽ More
Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM), a novel SSL approach that learns robust representations directly from incomplete data without requiring explicit imputation. AIM's core novelty lies in its use of learnable mask tokens to model both existing ("inherited") and artificially introduced missingness, enabling it to robustly handle fragmented real-world data during inference. Pre-trained on an extensive dataset of 40M hours of day-long multimodal sensor data, our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling. Furthermore, LSM-2 with AIM exhibits superior scaling performance, and critically, maintains high performance even under targeted missingness scenarios, reflecting clinically coherent patterns, such as the diagnostic value of nighttime biosignals for hypertension prediction. This makes AIM a more reliable choice for real-world wearable data applications.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair
Authors:
Zanis Ali Khan,
Aayush Garg,
Qiang Tang
Abstract:
Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We…
▽ More
Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We evaluate their accuracy and generalization to unknown vulnerabilities. Results show that while both models face challenges with fragmented or sparse context, CodeBERT performs comparatively better in such scenarios, whereas CodeT5 excels in capturing complex vulnerability patterns. CodeT5 also demonstrates superior scalability. Furthermore, we test fine-tuned models on both in-distribution (trained) and out-of-distribution (unseen) datasets. While fine-tuning improves in-distribution performance, models struggle to generalize to unseen data, highlighting challenges in robust vulnerability detection. This study benchmarks model performance, identifies limitations in generalization, and provides actionable insights to advance automated vulnerability patching for real-world security applications.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Fully-Distributed Construction of Byzantine-Resilient Dynamic Peer-to-Peer Networks
Authors:
Aayush Gupta,
Gopal Pandurangan
Abstract:
We address a fundamental problem in Peer-to-Peer (P2P) networks, namely, constructing and maintaining dynamic P2P overlay network topologies with essential properties such as connectivity, low diameter, and high expansion, that are resilient to continuous high churn and the presence of a large number of malicious (Byzantine) nodes. Our main goal is to construct and maintain a sparse (bounded degre…
▽ More
We address a fundamental problem in Peer-to-Peer (P2P) networks, namely, constructing and maintaining dynamic P2P overlay network topologies with essential properties such as connectivity, low diameter, and high expansion, that are resilient to continuous high churn and the presence of a large number of malicious (Byzantine) nodes. Our main goal is to construct and maintain a sparse (bounded degree) expander topology despite high churn and a large number of Byzantine nodes. Such an expander topology has logarithmic diameter, high expansion, and is robust to churn and the presence of a large number of bad nodes, and facilitates efficient and robust algorithms for fundamental problems in distributed computing, such as agreement, broadcasting, routing, etc.
Our main contribution is a randomized, fully-distributed dynamic P2P protocol that works with only local initial knowledge and guarantees, with a high probability, the maintenance of a constant degree graph with high expansion even under continuous churn and in the presence of a large number of Byzantine nodes. Our protocol can tolerate up to $o(n/poly\log(n))$ Byzantine nodes (where $n$ is the stable network size). Our protocol is efficient, lightweight, and scalable, and it incurs only $O(poly\log(n))$ overhead for topology maintenance: only polylogarithmic (in $n$) bits need to be processed and sent by each honest node per round, and any honest node's computation cost per round is also polylogarithmic.
Our protocol can be used as a building block for solving fundamental distributed computing problems in highly dynamic networks, such as Byzantine agreement and Byzantine leader election, and enables fast and scalable algorithms for these problems.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Self-Supervised Spatial Correspondence Across Modalities
Authors:
Ayush Shrivastava,
Andrew Owens
Abstract:
We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for bot…
▽ More
We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for both cross-modal and intra-modal matching. The resulting model is simple and has no explicit photo-consistency assumptions. It can be trained entirely using unlabeled data, without the need for any spatially aligned multimodal image pairs. We evaluate our method on both geometric and semantic correspondence tasks. For geometric matching, we consider challenging tasks such as RGB-to-depth and RGB-to-thermal matching (and vice versa); for semantic matching, we evaluate on photo-sketch and cross-style image alignment. Our method achieves strong performance across all benchmarks.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Sign Language: Towards Sign Understanding for Robot Autonomy
Authors:
Ayush Agrawal,
Joel Loo,
Nicky Zimmerman,
David Hsu
Abstract:
Signage is an ubiquitous element of human environments, playing a critical role in both scene understanding and navigation. For autonomous systems to fully interpret human environments, effectively parsing and understanding signs is essential. We introduce the task of navigational sign understanding, aimed at extracting navigational cues from signs that convey symbolic spatial information about th…
▽ More
Signage is an ubiquitous element of human environments, playing a critical role in both scene understanding and navigation. For autonomous systems to fully interpret human environments, effectively parsing and understanding signs is essential. We introduce the task of navigational sign understanding, aimed at extracting navigational cues from signs that convey symbolic spatial information about the scene. Specifically, we focus on signs capturing directional cues that point toward distant locations and locational cues that identify specific places. To benchmark performance on this task, we curate a comprehensive test set, propose appropriate evaluation metrics, and establish a baseline approach. Our test set consists of over 160 images, capturing signs with varying complexity and design across a wide range of public spaces, such as hospitals, shopping malls, and transportation hubs. Our baseline approach harnesses Vision-Language Models (VLMs) to parse navigational signs under these high degrees of variability. Experiments show that VLMs offer promising performance on this task, potentially motivating downstream applications in robotics. The code and dataset are available on Github.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches
Authors:
Omri Lev,
Vishwak Srinivasan,
Moshe Shenfeld,
Katrina Ligett,
Ayush Sekhari,
Ashia C. Wilson
Abstract:
Gaussian sketching, which consists of pre-multiplying the data with a random Gaussian matrix, is a widely used technique for multiple problems in data science and machine learning, with applications spanning computationally efficient optimization, coded computing, and federated learning. This operation also provides differential privacy guarantees due to its inherent randomness. In this work, we r…
▽ More
Gaussian sketching, which consists of pre-multiplying the data with a random Gaussian matrix, is a widely used technique for multiple problems in data science and machine learning, with applications spanning computationally efficient optimization, coded computing, and federated learning. This operation also provides differential privacy guarantees due to its inherent randomness. In this work, we revisit this operation through the lens of Renyi Differential Privacy (RDP), providing a refined privacy analysis that yields significantly tighter bounds than prior results. We then demonstrate how this improved analysis leads to performance improvement in different linear regression settings, establishing theoretical utility guarantees. Empirically, our methods improve performance across multiple datasets and, in several cases, reduce runtime.
△ Less
Submitted 4 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning
Authors:
Stepan Shabalin,
Ayush Panda,
Dmitrii Kharlapenko,
Abdur Raheem Ali,
Yixiong Hao,
Arthur Conmy
Abstract:
Sparse autoencoders are a promising new approach for decomposing language model activations for interpretation and control. They have been applied successfully to vision transformer image encoders and to small-scale diffusion models. Inference-Time Decomposition of Activations (ITDA) is a recently proposed variant of dictionary learning that takes the dictionary to be a set of data points from the…
▽ More
Sparse autoencoders are a promising new approach for decomposing language model activations for interpretation and control. They have been applied successfully to vision transformer image encoders and to small-scale diffusion models. Inference-Time Decomposition of Activations (ITDA) is a recently proposed variant of dictionary learning that takes the dictionary to be a set of data points from the activation distribution and reconstructs them with gradient pursuit. We apply Sparse Autoencoders (SAEs) and ITDA to a large text-to-image diffusion model, Flux 1, and consider the interpretability of embeddings of both by introducing a visual automated interpretation pipeline. We find that SAEs accurately reconstruct residual stream embeddings and beat MLP neurons on interpretability. We are able to use SAE features to steer image generation through activation addition. We find that ITDA has comparable interpretability to SAEs.
△ Less
Submitted 2 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Winners vs. Losers: Momentum-based Strategies with Intertemporal Choice for ESG Portfolios
Authors:
Ayush Jha,
Abootaleb Shirvani,
Ali Jaffri,
Svetlozar T. Rachev,
Frank J. Fabozzi
Abstract:
This paper introduces a state-dependent momentum framework that integrates ESG regime switching with tail-risk-aware reward-risk metrics. Using a dynamic programming approach and solving a finite-horizon Bellman equation, we construct long-short momentum portfolios that adjust to changing ESG sentiment regimes. Unlike traditional momentum strategies based on historical returns, our approach incorp…
▽ More
This paper introduces a state-dependent momentum framework that integrates ESG regime switching with tail-risk-aware reward-risk metrics. Using a dynamic programming approach and solving a finite-horizon Bellman equation, we construct long-short momentum portfolios that adjust to changing ESG sentiment regimes. Unlike traditional momentum strategies based on historical returns, our approach incorporates the Stable Tail Adjusted Return ratio and Rachev ratio to better capture downside risk in turbulent markets. We apply this framework across three asset classes, Russell 3000 equities, Dow Jones~30 stocks, and cryptocurrencies, under both pro- and anti-ESG market regimes. We find that ESG-loser portfolios significantly outperform ESG-winner portfolios in pro-ESG regimes, a counterintuitive result suggesting that market overreaction to ESG sentiment creates short-term pricing inefficiencies. This pattern is robust across tail-sensitive performance metrics and is most pronounced under a two-week formation and holding period. Our framework highlights how ESG considerations and sentiment regimes alter return dynamics, offering practical guidance for investors seeking to implement responsive momentum strategies under sustainability constraints. These findings challenge conventional assumptions about ESG investing and underscore the importance of dynamic, regime-aware portfolio construction in environments shaped by regulatory signals, investor flows, and behavioral biases.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
Authors:
Jiacheng Xie,
Yang Yu,
Ziyang Zhang,
Shuai Zeng,
Jiaxuan He,
Ayush Vasireddy,
Xiaoting Tang,
Congyu Guo,
Lening Zhao,
Congcong Jing,
Guanghui An,
Dong Xu
Abstract:
Traditional Chinese Medicine (TCM), as an effective alternative medicine, has been receiving increasing attention. In recent years, the rapid development of large language models (LLMs) tailored for TCM has underscored the need for an objective and comprehensive evaluation framework to assess their performance on real-world tasks. However, existing evaluation datasets are limited in scope and prim…
▽ More
Traditional Chinese Medicine (TCM), as an effective alternative medicine, has been receiving increasing attention. In recent years, the rapid development of large language models (LLMs) tailored for TCM has underscored the need for an objective and comprehensive evaluation framework to assess their performance on real-world tasks. However, existing evaluation datasets are limited in scope and primarily text-based, lacking a unified and standardized multimodal question-answering (QA) benchmark. To address this issue, we introduce TCM-Ladder, the first multimodal QA dataset specifically designed for evaluating large TCM language models. The dataset spans multiple core disciplines of TCM, including fundamental theory, diagnostics, herbal formulas, internal medicine, surgery, pharmacognosy, and pediatrics. In addition to textual content, TCM-Ladder incorporates various modalities such as images and videos. The datasets were constructed using a combination of automated and manual filtering processes and comprise 52,000+ questions in total. These questions include single-choice, multiple-choice, fill-in-the-blank, diagnostic dialogue, and visual comprehension tasks. We trained a reasoning model on TCM-Ladder and conducted comparative experiments against 9 state-of-the-art general domain and 5 leading TCM-specific LLMs to evaluate their performance on the datasets. Moreover, we propose Ladder-Score, an evaluation method specifically designed for TCM question answering that effectively assesses answer quality regarding terminology usage and semantic expression. To our knowledge, this is the first work to evaluate mainstream general domain and TCM-specific LLMs on a unified multimodal benchmark. The datasets and leaderboard are publicly available at https://tcmladder.com or https://54.211.107.106 and will be continuously updated.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Grounded Reinforcement Learning for Visual Reasoning
Authors:
Gabriel Sarch,
Snigdha Saha,
Naitik Khandelwal,
Ayush Jain,
Michael J. Tarr,
Aviral Kumar,
Katerina Fragkiadaki
Abstract:
While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention, interpret perceptual inputs, and ground abstract reasoning in spatial evidence. We introduce ViGoRL (Visually Grounded Reinforcement Learning), a vision-language mode…
▽ More
While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention, interpret perceptual inputs, and ground abstract reasoning in spatial evidence. We introduce ViGoRL (Visually Grounded Reinforcement Learning), a vision-language model trained with RL to explicitly anchor each reasoning step to specific visual coordinates. Inspired by human visual decision-making, ViGoRL learns to produce spatially grounded reasoning traces, guiding visual attention to task-relevant regions at each step. When fine-grained exploration is required, our novel multi-turn RL framework enables the model to dynamically zoom into predicted coordinates as reasoning unfolds. Across a diverse set of visual reasoning benchmarks--including SAT-2 and BLINK for spatial reasoning, V*bench for visual search, and ScreenSpot and VisualWebArena for web-based grounding--ViGoRL consistently outperforms both supervised fine-tuning and conventional RL baselines that lack explicit grounding mechanisms. Incorporating multi-turn RL with zoomed-in visual feedback significantly improves ViGoRL's performance on localizing small GUI elements and visual search, achieving 86.4% on V*Bench. Additionally, we find that grounding amplifies other visual behaviors such as region exploration, grounded subgoal setting, and visual verification. Finally, human evaluations show that the model's visual references are not only spatially accurate but also helpful for understanding model reasoning steps. Our results show that visually grounded RL is a strong paradigm for imbuing models with general-purpose visual reasoning.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Preference Learning with Response Time
Authors:
Ayush Sawarni,
Sahasrajit Sarmasarkar,
Vasilis Syrgkanis
Abstract:
This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel…
▽ More
This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel methodologies to incorporate response time information alongside binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ) model, under which response time is informative of the preference strength. We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning, matching the theoretical optimal rates that would be attained if the expected response times for each query were known a priori. Our theoretical analysis demonstrates that for linear reward functions, conventional preference learning suffers from error rates that scale exponentially with reward magnitude. In contrast, our response time-augmented approach reduces this to polynomial scaling, representing a significant improvement in sample efficiency. We extend these guarantees to non-parametric reward function spaces, establishing convergence properties for more complex, realistic reward models. Our extensive experiments validate our theoretical findings in the context of preference learning over images.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
A Graph Completion Method that Jointly Predicts Geometry and Topology Enables Effective Molecule Assembly
Authors:
Rohan V. Koodli,
Alexander S. Powers,
Ayush Pandit,
Chiho Im,
Ron O. Dror
Abstract:
A common starting point for drug design is to find small chemical groups or "fragments" that form interactions with distinct subregions in a protein binding pocket. The subsequent challenge is to assemble these fragments into a molecule that has high affinity to the protein, by adding chemical bonds between atoms in different fragments. This "molecule assembly" task is particularly challenging bec…
▽ More
A common starting point for drug design is to find small chemical groups or "fragments" that form interactions with distinct subregions in a protein binding pocket. The subsequent challenge is to assemble these fragments into a molecule that has high affinity to the protein, by adding chemical bonds between atoms in different fragments. This "molecule assembly" task is particularly challenging because, initially, fragment positions are known only approximately. Prior methods for spatial graph completion-adding missing edges to a graph whose nodes have associated spatial coordinates-either treat node positions as fixed or adjust node positions before predicting edges. The fact that these methods treat geometry and topology prediction separately limits their ability to reconcile noisy geometries and plausible connectivities. To address this limitation, we introduce EdGr, a spatial graph diffusion model that reasons jointly over geometry and topology of molecules to simultaneously predict fragment positions and inter-fragment bonds. Importantly, predicted edge likelihoods directly influence node position updates during the diffusion denoising process, allowing connectivity cues to guide spatial movements, and vice versa. EdGr substantially outperforms previous methods on the molecule assembly task and maintains robust performance as noise levels increase. Beyond drug discovery, our approach of explicitly coupling geometry and topology prediction is broadly applicable to spatial graph completion problems, such as neural circuit reconstruction, 3D scene understanding, and sensor network design.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification
Authors:
Poojah Ganesan,
Rajat Aayush Jha,
Dan Roth,
Vivek Gupta
Abstract:
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issu…
▽ More
Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issues, we introduce UNJOIN, a two-stage framework that decouples the retrieval of schema elements from SQL logic generation. In the first stage, we merge the column names of all tables in the database into a single-table representation by prefixing each column with its table name. This allows the model to focus purely on accurate retrieval without being distracted by the need to write complex SQL logic. In the second stage, the SQL query is generated on this simplified schema and mapped back to the original schema by reconstructing JOINs, UNIONs, and relational logic. Evaluations on SPIDER and BIRD datasets show that UNJOIN matches or exceeds the state-of-the-art baselines. UNJOIN uses only schema information, which does not require data access or fine-tuning, making it scalable and adaptable across databases.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
The Quasi-Polynomial Low-Degree Conjecture is False
Authors:
Rares-Darius Buhai,
Jun-Ting Hsieh,
Aayush Jain,
Pravesh K. Kothari
Abstract:
There is a growing body of work on proving hardness results for average-case estimation problems by bounding the low-degree advantage (LDA) - a quantitative estimate of the closeness of low-degree moments - between a null distribution and a related planted distribution. Such hardness results are now ubiquitous not only for foundational average-case problems but also central questions in statistics…
▽ More
There is a growing body of work on proving hardness results for average-case estimation problems by bounding the low-degree advantage (LDA) - a quantitative estimate of the closeness of low-degree moments - between a null distribution and a related planted distribution. Such hardness results are now ubiquitous not only for foundational average-case problems but also central questions in statistics and cryptography. This line of work is supported by the low-degree conjecture of Hopkins, which postulates that a vanishing degree-$D$ LDA implies the absence of any noise-tolerant distinguishing algorithm with runtime $n^{\widetilde{O}(D)}$ whenever 1) the null distribution is product on $\{0,1\}^{\binom{n}{k}}$, and 2) the planted distribution is permutation invariant, that is, invariant under any relabeling $[n] \rightarrow [n]$.
In this paper, we disprove this conjecture. Specifically, we show that for any fixed $\varepsilon>0$ and $k\geq 2$, there is a permutation-invariant planted distribution on $\{0,1\}^{\binom{n}{k}}$ that has a vanishing degree-$n^{1-O(\varepsilon)}$ LDA with respect to the uniform distribution on $\{0,1\}^{\binom{n}{k}}$, yet the corresponding $\varepsilon$-noisy distinguishing problem can be solved in $n^{O(\log^{1/(k-1)}(n))}$ time. Our construction relies on algorithms for list-decoding for noisy polynomial interpolation in the high-error regime.
We also give another construction of a pair of planted and (non-product) null distributions on $\mathbb{R}^{n \times n}$ with a vanishing $n^{Ω(1)}$-degree LDA while the largest eigenvalue serves as an efficient noise-tolerant distinguisher.
Our results suggest that while a vanishing LDA may still be interpreted as evidence of hardness, developing a theory of average-case complexity based on such heuristics requires a more careful approach.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
SONIC: Cost-Effective Web Access for Developing Countries
Authors:
Ayush Pandey,
Rohail Asim,
Jean Louis K. E. Fendji,
Talal Rahwan,
Matteo Varvello,
Yasir Zaki
Abstract:
Over 2.6 billion people remain without access to the Internet in 2025. This phenomenon is especially pronounced in developing regions, where cost and infrastructure limitations are major barriers to connectivity. In response, we design SONIC, a low-cost, scalable data delivery system that builds on existing infrastructures: FM radio for downlink broadcasting, and SMS for personalized uplink. SONIC…
▽ More
Over 2.6 billion people remain without access to the Internet in 2025. This phenomenon is especially pronounced in developing regions, where cost and infrastructure limitations are major barriers to connectivity. In response, we design SONIC, a low-cost, scalable data delivery system that builds on existing infrastructures: FM radio for downlink broadcasting, and SMS for personalized uplink. SONIC is motivated by the widespread availability of FM radio and SMS infrastructure in developing regions, along with embedded FM radio tuners in affordable mobile phones. SONIC offers several innovations to effectively transmit Web content over sound over FM radio, in a reliable and compressed form. For example, we transmit pre-rendered webpages and leverage pixel interpolation to recover errors at the receiver. We further modify Android to offer a simpler deployment pipeline, supporting a wide range of devices. We deployed SONIC at an FM radio station in Cameroon for six weeks with 30 participants. Our results demonstrate a sustained downlink throughput of 10 kbps, less than 20% loss for a majority of transmissions with signal strength above -90 dbM, and a strong user engagement across both Web browsing and ChatGPT interactions.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models
Authors:
Kalindi Singh,
Aayush Kashyap,
Aswani Kumar Cherukuri
Abstract:
The widespread adoption of encrypted communication protocols such as HTTPS and TLS has enhanced data privacy but also rendered traditional anomaly detection techniques less effective, as they often rely on inspecting unencrypted payloads. This study aims to develop an interpretable machine learning-based framework for anomaly detection in encrypted network traffic. This study proposes a model-agno…
▽ More
The widespread adoption of encrypted communication protocols such as HTTPS and TLS has enhanced data privacy but also rendered traditional anomaly detection techniques less effective, as they often rely on inspecting unencrypted payloads. This study aims to develop an interpretable machine learning-based framework for anomaly detection in encrypted network traffic. This study proposes a model-agnostic framework that integrates multiple machine learning classifiers, with SHapley Additive exPlanations SHAP to ensure post-hoc model interpretability. The models are trained and evaluated on three benchmark encrypted traffic datasets. Performance is assessed using standard classification metrics, and SHAP is used to explain model predictions by attributing importance to individual input features. SHAP visualizations successfully revealed the most influential traffic features contributing to anomaly predictions, enhancing the transparency and trustworthiness of the models. Unlike conventional approaches that treat machine learning as a black box, this work combines robust classification techniques with explainability through SHAP, offering a novel interpretable anomaly detection system tailored for encrypted traffic environments. While the framework is generalizable, real-time deployment and performance under adversarial conditions require further investigation. Future work may explore adaptive models and real-time interpretability in operational network environments. This interpretable anomaly detection framework can be integrated into modern security operations for encrypted environments, allowing analysts not only to detect anomalies with high precision but also to understand why a model made a particular decision a crucial capability in compliance-driven and mission-critical settings.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning
Authors:
Tiasa Singha Roy,
Aditeya Baral,
Ayush Rajesh Jhaveri,
Yusuf Baig
Abstract:
Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel eva…
▽ More
Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel evaluation framework. We propose an evaluation metric called the MAPLE score, which holistically quantifies reasoning misalignment by integrating error rates, redundancy, and validity.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Physics-Guided Multi-View Graph Neural Network for Schizophrenia Classification via Structural-Functional Coupling
Authors:
Badhan Mazumder,
Ayush Kanyal,
Lei Wu,
Vince D. Calhoun,
Dong Hye Ye
Abstract:
Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To…
▽ More
Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To tackle the challenge, we propose a novel physics-guided deep learning framework that leverages a neural oscillation model to describe the dynamics of a collection of interconnected neural oscillators, which operate via nerve fibers dispersed across the brain's structure. Our proposed framework utilizes SC to simultaneously generate FC by learning SC-FC coupling from a system dynamics perspective. Additionally, it employs a novel multi-view graph neural network (GNN) with a joint loss to perform correlation-based SC-FC fusion and classification of individuals with SZ. Experiments conducted on a clinical dataset exhibited improved performance, demonstrating the robustness of our proposed approach.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Improving the fact-checking performance of language models by relying on their entailment ability
Authors:
Gaurav Kumar,
Debajyoti Mazumder,
Ayush Garg,
Jasabanta Patro
Abstract:
Automated fact-checking is a crucial task in this digital age. To verify a claim, current approaches majorly follow one of two strategies i.e. (i) relying on embedded knowledge of language models, and (ii) fine-tuning them with evidence pieces. While the former can make systems to hallucinate, the later have not been very successful till date. The primary reason behind this is that fact verificati…
▽ More
Automated fact-checking is a crucial task in this digital age. To verify a claim, current approaches majorly follow one of two strategies i.e. (i) relying on embedded knowledge of language models, and (ii) fine-tuning them with evidence pieces. While the former can make systems to hallucinate, the later have not been very successful till date. The primary reason behind this is that fact verification is a complex process. Language models have to parse through multiple pieces of evidence before making a prediction. Further, the evidence pieces often contradict each other. This makes the reasoning process even more complex. We proposed a simple yet effective approach where we relied on entailment and the generative ability of language models to produce ''supporting'' and ''refuting'' justifications (for the truthfulness of a claim). We trained language models based on these justifications and achieved superior results. Apart from that, we did a systematic comparison of different prompting and fine-tuning strategies, as it is currently lacking in the literature. Some of our observations are: (i) training language models with raw evidence sentences registered an improvement up to 8.20% in macro-F1, over the best performing baseline for the RAW-FC dataset, (ii) similarly, training language models with prompted claim-evidence understanding (TBE-2) registered an improvement (with a margin up to 16.39%) over the baselines for the same dataset, (iii) training language models with entailed justifications (TBE-3) outperformed the baselines by a huge margin (up to 28.57% and 44.26% for LIAR-RAW and RAW-FC, respectively). We have shared our code repository to reproduce the results.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Authors:
Subash Khanal,
Srikumar Sastry,
Aayush Dhakal,
Adeel Ahmad,
Nathan Jacobs
Abstract:
We present Sat2Sound, a multimodal representation learning framework for soundscape mapping, designed to predict the distribution of sounds at any location on Earth. Existing methods for this task rely on satellite image and paired geotagged audio samples, which often fail to capture the diversity of sound sources at a given location. To address this limitation, we enhance existing datasets by lev…
▽ More
We present Sat2Sound, a multimodal representation learning framework for soundscape mapping, designed to predict the distribution of sounds at any location on Earth. Existing methods for this task rely on satellite image and paired geotagged audio samples, which often fail to capture the diversity of sound sources at a given location. To address this limitation, we enhance existing datasets by leveraging a Vision-Language Model (VLM) to generate semantically rich soundscape descriptions for locations depicted in satellite images. Our approach incorporates contrastive learning across audio, audio captions, satellite images, and satellite image captions. We hypothesize that there is a fixed set of soundscape concepts shared across modalities. To this end, we learn a shared codebook of soundscape concepts and represent each sample as a weighted average of these concepts. Sat2Sound achieves state-of-the-art performance in cross-modal retrieval between satellite image and audio on two datasets: GeoSound and SoundingEarth. Additionally, building on Sat2Sound's ability to retrieve detailed soundscape captions, we introduce a novel application: location-based soundscape synthesis, which enables immersive acoustic experiences. Our code and models will be publicly available.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
The CHIME/FRB Discovery of the Extremely Active Fast Radio Burst Source FRB 20240114A
Authors:
Kaitlyn Shin,
Alice Curtin,
Maxwell Fine,
Ayush Pandhi,
Shion Andrew,
Mohit Bhardwaj,
Shami Chatterjee,
Amanda M. Cook,
Emmanuel Fonseca,
B. M. Gaensler,
Jason Hessels,
Naman Jain,
Victoria M. Kaspi,
Bikash Kharel,
Adam E. Lanman,
Mattias Lazda,
Calvin Leung,
Robert Main,
Kiyoshi W. Masui,
Daniele Michilli,
Mason Ng,
Kenzie Nimmo,
Aaron B. Pearlman,
Ue-Li Pen,
Ziggy Pleunis
, et al. (6 additional authors not shown)
Abstract:
Among the thousands of observed fast radio bursts (FRBs), a few sources exhibit exceptionally high burst activity observable by many telescopes across a broad range of radio frequencies. Almost all of these highly active repeaters have been discovered by CHIME/FRB, due to its daily observations of the entire Northern sky as a transit radio telescope. FRB 20240114A is a source discovered and report…
▽ More
Among the thousands of observed fast radio bursts (FRBs), a few sources exhibit exceptionally high burst activity observable by many telescopes across a broad range of radio frequencies. Almost all of these highly active repeaters have been discovered by CHIME/FRB, due to its daily observations of the entire Northern sky as a transit radio telescope. FRB 20240114A is a source discovered and reported by CHIME/FRB to the community in January 2024; given its low declination, even the detection of a few bursts hints at a high burst rate. Following the community announcement of this source as a potentially active repeater, it was extensively followed up by other observatories and has emerged as one of the most prolific FRB repeaters ever observed. This paper presents the five bursts CHIME/FRB observed from FRB 20240114A, with channelized raw voltage data saved for two bursts. We do not observe changes in the DM of the source greater than ~1.3 pc cm$^{-3}$ in our observations over nearly a year baseline. We find an RM of ~ +320 rad m$^{-2}$. We do not find evidence for scattering at the level of < 0.3 ms in the bursts, and we find no evidence for astrophysical scintillation. In our observations of FRB 20240114A, we see a burst rate ~49x higher than the median burst rate of apparent non-repeaters also discovered by CHIME/FRB. Each discovery of highly active FRBs provides a valuable opportunity to investigate whether there is a fundamental difference between repeating and apparently non-repeating sources.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Multivariate Affine GARCH with Heavy Tails: A Unified Framework for Portfolio Optimization and Option Valuation
Authors:
Ayush Jha,
Abootaleb Shirvani,
Ali Jaffri,
Svetlozar T. Rachev,
Frank J. Fabozzi
Abstract:
This paper develops and estimates a multivariate affine GARCH(1,1) model with Normal Inverse Gaussian innovations that captures time-varying volatility, heavy tails, and dynamic correlation across asset returns. We generalize the Heston-Nandi framework to a multivariate setting and apply it to 30 Dow Jones Industrial Average stocks. The model jointly supports three core financial applications: dyn…
▽ More
This paper develops and estimates a multivariate affine GARCH(1,1) model with Normal Inverse Gaussian innovations that captures time-varying volatility, heavy tails, and dynamic correlation across asset returns. We generalize the Heston-Nandi framework to a multivariate setting and apply it to 30 Dow Jones Industrial Average stocks. The model jointly supports three core financial applications: dynamic portfolio optimization, wealth path simulation, and option pricing. Closed-form solutions are derived for a Constant Relative Risk Aversion (CRRA) investor's intertemporal asset allocation, and we implement a forward-looking risk-adjusted performance comparison against Merton-style constant strategies. Using the model's conditional volatilities, we also construct implied volatility surfaces for European options, capturing skew and smile features. Empirically, we document substantial wealth-equivalent utility losses from ignoring time-varying correlation and tail risk. These findings underscore the value of a unified econometric framework for analyzing joint asset dynamics and for managing portfolio and derivative exposures under non-Gaussian risks.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Elementary symmetric polynomials under the fixed point measure
Authors:
Ayush Khaitan,
Ishan Mata,
Bhargav Narayanan
Abstract:
We identify a surprising inequality satisfied by elementary symmetric polynomials under the action of the fixed point measure of a random permutation. Concretely, for any collection of $n$ non-negative real numbers $a_1, \dots, a_n \in \mathbb{R}_{\geq 0}$, we prove that
\[
\frac{1}{n!} \sum_{π\in S_n} \left[\prod_{\{i:i=π(i)\}} a_i\right] \ge \frac{1}{\binom{n}{2}} \sum_{S \in\binom{[n]}{2}}…
▽ More
We identify a surprising inequality satisfied by elementary symmetric polynomials under the action of the fixed point measure of a random permutation. Concretely, for any collection of $n$ non-negative real numbers $a_1, \dots, a_n \in \mathbb{R}_{\geq 0}$, we prove that
\[
\frac{1}{n!} \sum_{π\in S_n} \left[\prod_{\{i:i=π(i)\}} a_i\right] \ge \frac{1}{\binom{n}{2}} \sum_{S \in\binom{[n]}{2}} \left[ \left(\prod_{\{i \in S\}} a_i \right)^{1/2}\right],
\]
and this bound is sharp. To prove this elementary inequality, we construct a collection of differential operators to set up a monotone flow that then allows us to establish the inequality.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Moduli spaces of Hom-Lie algebroid connections
Authors:
Ayush Jaiswal
Abstract:
We have studied irreducible Hom-Lie algebroid connections for Hom-bundle and prove that the H-gauge theoretic moduli space has a Hausdorff Hilbert manifold structure. This work generalizes some known results about simple semi-connections and Lie algebroid connections for complex vector bundles on compact complex manifold.
We have studied irreducible Hom-Lie algebroid connections for Hom-bundle and prove that the H-gauge theoretic moduli space has a Hausdorff Hilbert manifold structure. This work generalizes some known results about simple semi-connections and Lie algebroid connections for complex vector bundles on compact complex manifold.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
ChestyBot: Detecting and Disrupting Chinese Communist Party Influence Stratagems
Authors:
Matthew Stoffolano,
Ayush Rout,
Justin M. Pelletier
Abstract:
Foreign information operations conducted by Russian and Chinese actors exploit the United States' permissive information environment. These campaigns threaten democratic institutions and the broader Westphalian model. Yet, existing detection and mitigation strategies often fail to identify active information campaigns in real time. This paper introduces ChestyBot, a pragmatics-based language model…
▽ More
Foreign information operations conducted by Russian and Chinese actors exploit the United States' permissive information environment. These campaigns threaten democratic institutions and the broader Westphalian model. Yet, existing detection and mitigation strategies often fail to identify active information campaigns in real time. This paper introduces ChestyBot, a pragmatics-based language model that detects unlabeled foreign malign influence tweets with up to 98.34% accuracy. The model supports a novel framework to disrupt foreign influence operations in their formative stages.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models
Authors:
Aditya Nagori,
Ayush Gautam,
Matthew O. Wiens,
Vuong Nguyen,
Nathan Kenya Mugisha,
Jerome Kabakyenga,
Niranjan Kissoon,
John Mark Ansermino,
Rishikesan Kamaleswaran
Abstract:
Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 record…
▽ More
Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Authors:
Ayush K. Rai,
Kyle Min,
Tarun Krishna,
Feiyan Hu,
Alan F. Smeaton,
Noel E. O'Connor
Abstract:
Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-base…
▽ More
Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-based masking, as well as approaches that leverage key motion priors, optical flow and semantic cues from externally pre-trained models. In this work, we introduce a novel and generalizable Trajectory-Aware Adaptive Token Sampler (TATS), which models the motion dynamics of tokens and can be seamlessly integrated into the masked autoencoder (MAE) framework to select motion-centric tokens in videos. Additionally, we propose a unified training strategy that enables joint optimization of both MAE and TATS from scratch using Proximal Policy Optimization (PPO). We show that our model allows for aggressive masking without compromising performance on the downstream task of action recognition while also ensuring that the pre-training remains memory efficient. Extensive experiments of the proposed approach across four benchmarks, including Something-Something v2, Kinetics-400, UCF101, and HMDB51, demonstrate the effectiveness, transferability, generalization, and efficiency of our work compared to other state-of-the-art methods.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
The Polarisation Sky Survey of the Universe's Magnetism (POSSUM): Science Goals and Survey Description
Authors:
B. M. Gaensler,
G. H. Heald,
N. M. McClure-Griffiths,
C. S. Anderson,
C. L. Van Eck,
J. L. West,
A. J. M. Thomson,
J. P. Leahy,
L. Rudnick,
Y. K. Ma,
Takuya Akahori,
G. Gürkan,
T. L. Landecker,
S. A. Mao,
S. P. O'Sullivan,
W. Raja,
X. Sun,
T. Vernstrom,
Lerato Baidoo,
Ettore Carretti,
A. R. Taylor,
A. G. Willis,
Erik Osinga,
J. D. Livingston,
E. L. Alexander
, et al. (35 additional authors not shown)
Abstract:
The Australian SKA Pathfinder (ASKAP) offers powerful new capabilities for studying the polarised and magnetised Universe at radio wavelengths. In this paper, we introduce the Polarisation Sky Survey of the Universe's Magnetism (POSSUM), a groundbreaking survey with three primary objectives: (1) to create a comprehensive Faraday rotation measure (RM) grid of up to one million compact extragalactic…
▽ More
The Australian SKA Pathfinder (ASKAP) offers powerful new capabilities for studying the polarised and magnetised Universe at radio wavelengths. In this paper, we introduce the Polarisation Sky Survey of the Universe's Magnetism (POSSUM), a groundbreaking survey with three primary objectives: (1) to create a comprehensive Faraday rotation measure (RM) grid of up to one million compact extragalactic sources across the southern ~50 per cent of the sky (20,630 deg$^2$); (2) to map the intrinsic polarisation and RM properties of a wide range of discrete extragalactic and Galactic objects over the same area; and (3) to contribute interferometric data with excellent surface brightness sensitivity, which can be combined with single-dish data to study the diffuse Galactic interstellar medium. Observations for the full POSSUM survey commenced in May 2023 and are expected to conclude by mid-2028. POSSUM will achieve an RM grid density of around 30-50 RMs per square degree with a median measurement uncertainty of ~1 rad m$^{-2}$. The survey operates primarily over a frequency range of 800-1088 MHz, with an angular resolution of 20'' and a typical RMS sensitivity in Stokes $Q$ or $U$ of 18 $μ$Jy beam$^{-1}$. Additionally, the survey will be supplemented by similar observations covering 1296-1440 MHz over 38 per cent of the sky. POSSUM will enable the discovery and detailed investigation of magnetised phenomena in a wide range of cosmic environments, as well as the interplay between these components. This paper reviews the current science case developed by the POSSUM Collaboration and provides an overview of POSSUM's observations, data processing, outputs, and its complementarity with other radio and multi-wavelength surveys, including future work with the SKA. [Abstract abridged]
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Utility Maximization Under Endogenous Uncertainty
Authors:
Ayush Gupta
Abstract:
This paper establishes a general existence result for expected utility maximization in settings where the agent's decision affects the uncertainty faced by her. We introduce a continuity condition for choice-dependent probability measures which ensures the upper semi-continuity of expected utility. Our topological proof imposes minimal restrictions on the utility function and the random variable.…
▽ More
This paper establishes a general existence result for expected utility maximization in settings where the agent's decision affects the uncertainty faced by her. We introduce a continuity condition for choice-dependent probability measures which ensures the upper semi-continuity of expected utility. Our topological proof imposes minimal restrictions on the utility function and the random variable. In particular, we do not need common assumptions like the monotone likelihood ratio property (MLRP) or the convexity of distribution functions condition (CDFC). Additionally, we identify sufficient conditions - continuity of densities and stochastic dominance - which help verify our assumptions in most practical applications. These findings expand the applicability of expected utility theory in settings with endogenous uncertainty.
△ Less
Submitted 10 June, 2025; v1 submitted 11 May, 2025;
originally announced May 2025.
-
Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere
Authors:
Noah D. Brenowitz,
Tao Ge,
Akshay Subramaniam,
Aayush Gupta,
David M. Hall,
Morteza Mardani,
Arash Vahdat,
Karthik Kashinath,
Michael S. Pritchard
Abstract:
AI emulators offer a path to compressing, boosting limited ensembles, and improving the latency of interacting with petabyte-scale climate prediction data. However, prevailing auto-regressive paradigms offer limited flexibility, and are challenging to train on climate time horizons due to drifts, instabilities and component-coupling challenges. Conditionally generative models offer an appealing al…
▽ More
AI emulators offer a path to compressing, boosting limited ensembles, and improving the latency of interacting with petabyte-scale climate prediction data. However, prevailing auto-regressive paradigms offer limited flexibility, and are challenging to train on climate time horizons due to drifts, instabilities and component-coupling challenges. Conditionally generative models offer an appealing alternative. In this context we demonstrate a generative diffusion-based framework -- Climate in a Bottle (cBottle) -- for emulating global km-scale climate simulations and reanalysis on the equal-area HEALPix grid. cBottle consists of two model stages: a globally-trained coarse-resolution image generator that generates 100km (50k-pixel) fields given monthly average sea surface temperatures and solar conditioning, followed by a locally-trained 16x super-resolution stage that generates 5km (12.5M-pixel) fields; global super-resolution is made affordable using an overlapping patch-based multi-diffusion. Overall, cBottle shows promise as an emulator across a battery of climate model diagnostics, including diurnal-to-seasonal scale variability, large-scale modes of variability, tropical cyclone statistics, and trends of climate change and weather extremes. Moreover, cBottle is a step towards a foundation model, by bridging multiple data modalities (reanalysis and simulation) with corresponding utility beyond emulation to tasks such as zero-shot bias correction, climate downscaling, and channel in-filling.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.