Skip to main content

Showing 1–50 of 811 results for author: Aayush

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14978  [pdf, ps, other

    cs.LG

    ODD: Overlap-aware Estimation of Model Performance under Distribution Shift

    Authors: Aayush Mishra, Anqi Liu

    Abstract: Reliable and accurate estimation of the error of an ML model in unseen test domains is an important problem for safe intelligent systems. Prior work uses disagreement discrepancy (DIS^2) to derive practical error bounds under distribution shifts. It optimizes for a maximally disagreeing classifier on the target domain to bound the error of a given source classifier. Although this approach offers a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to the 41st Conference on Uncertainty in Artificial Intelligence, 2025

  2. arXiv:2506.13048  [pdf, ps, other

    cs.LG

    The Space Complexity of Learning-Unlearning Algorithms

    Authors: Yeshwanth Cherapanamjeri, Sumegha Garg, Nived Rajaraman, Ayush Sekhari, Abhishek Shetty

    Abstract: We study the memory complexity of machine unlearning algorithms that provide strong data deletion guarantees to the users. Formally, consider an algorithm for a particular learning task that initially receives a training dataset. Then, after learning, it receives data deletion requests from a subset of users (of arbitrary size), and the goal of unlearning is to perform the task as if the learner n… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  3. arXiv:2506.12347  [pdf, ps, other

    cs.SE cs.HC

    Sharp Tools: How Developers Wield Agentic AI in Real Software Engineering Tasks

    Authors: Aayush Kumar, Yasharth Bajpai, Sumit Gulwani, Gustavo Soares, Emerson Murphy-Hill

    Abstract: Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks. Consequently, SWE agents are often designed to allow interactivity with developers, enabling collaborative problem-solving. To understand how developers collaborate with SWE agents and the communication cha… ▽ More

    Submitted 17 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

  4. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  5. arXiv:2506.12097  [pdf, ps, other

    cs.CL cs.CR cs.LG stat.ML

    UCD: Unlearning in LLMs via Contrastive Decoding

    Authors: Vinith M. Suriyakumar, Ayush Sekhari, Ashia Wilson

    Abstract: Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using t… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  6. arXiv:2506.12003  [pdf

    cs.NI cs.AI cs.MA

    Upgrade or Switch: Do We Need a New Registry Architecture for the Internet of AI Agents?

    Authors: Ramesh Raskar, Pradyumna Chari, Jared James Grogan, Mahesh Lambe, Robert Lincourt, Raghu Bala, Abhishek Singh, Ayush Chopra, Rajesh Ranjan, Shailja Gupta, Dimitris Stripelis, Maria Gorskikh, Sichao Wang

    Abstract: The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed curren… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  7. arXiv:2506.11302  [pdf, ps, other

    cs.CV cs.AI

    TARDIS STRIDE: A Spatio-Temporal Road Image Dataset and World Model for Autonomy

    Authors: Héctor Carrión, Yutong Bai, Víctor A. Hernández Castro, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik

    Abstract: World models aim to simulate environments and enable effective agent behavior. However, modeling real-world environments presents unique challenges as they dynamically change across both space and, crucially, time. To capture these composed dynamics, we introduce a Spatio-Temporal Road Image Dataset for Exploration (STRIDE) permuting 360-degree panoramic imagery into rich interconnected observatio… ▽ More

    Submitted 18 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: Computer Vision, Pattern Recognition, Early-Fusion, Dataset, Data Augmentation

  8. arXiv:2506.10955  [pdf, ps, other

    cs.LG cs.AI cs.CV

    ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems

    Authors: Aayush Karan, Kulin Shah, Sitan Chen

    Abstract: There has been a flurry of activity around using pretrained diffusion models as informed data priors for solving inverse problems, and more generally around steering these models using reward models. Training-free methods like diffusion posterior sampling (DPS) and its many variants have offered flexible heuristic algorithms for these tasks, but when the reward is not informative enough, e.g., in… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 38 pages, 14 figures

  9. arXiv:2506.09445  [pdf, ps, other

    cs.CV cs.AI

    TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

    Authors: Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha

    Abstract: We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to j… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  10. arXiv:2506.09108  [pdf, ps, other

    cs.LG cs.AI cs.CL

    SensorLM: Learning the Language of Wearable Sensors

    Authors: Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

    Abstract: We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  11. arXiv:2506.08249  [pdf, other

    cs.DB cs.CL

    RADAR: Benchmarking Language Models on Imperfect Tabular Data

    Authors: Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

    Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  12. arXiv:2506.07259  [pdf, ps, other

    stat.ML cs.LG

    ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

    Authors: Daolang Huang, Xinyi Wen, Ayush Bharti, Samuel Kaski, Luigi Acerbi

    Abstract: Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 27 pages, 13 figures

  13. arXiv:2506.06087  [pdf, ps, other

    stat.ML astro-ph.CO astro-ph.IM cs.LG stat.CO

    Multilevel neural simulation-based inference

    Authors: Yuga Hikida, Ayush Bharti, Niall Jeffrey, François-Xavier Briol

    Abstract: Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  14. arXiv:2506.06073  [pdf, ps, other

    cs.LG

    System-Aware Unlearning Algorithms: Use Lesser, Forget Faster

    Authors: Linda Lu, Ayush Sekhari, Karthik Sridharan

    Abstract: Machine unlearning addresses the problem of updating a machine learning model/system trained on a dataset $S$ so that the influence of a set of deletion requests $U \subseteq S$ on the unlearned model is minimized. The gold standard definition of unlearning demands that the updated model, after deletion, be nearly identical to the model obtained by retraining. This definition is designed for a wor… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  15. arXiv:2506.05670  [pdf, ps, other

    cs.CL

    Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment

    Authors: Priyanka Dey, Yugal Khanter, Aayush Bothra, Jieyu Zhao, Emilio Ferrara

    Abstract: As LLMs become central to interactive applications, ranging from tutoring to mental health, the ability to express personality in culturally appropriate ways is increasingly important. While recent works have explored personality evaluation of LLMs, they largely overlook the interplay between culture and personality. To address this, we introduce CulturalPersonas, the first large-scale benchmark w… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  16. arXiv:2506.05321  [pdf, other

    cs.LG

    LSM-2: Learning from Incomplete Wearable Sensor Data

    Authors: Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff

    Abstract: Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Xu and Narayanswamy are co-first authors. McDuff and Liu are co-last authors

  17. arXiv:2506.04987  [pdf, ps, other

    cs.SE cs.AI

    A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair

    Authors: Zanis Ali Khan, Aayush Garg, Qiang Tang

    Abstract: Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security-critical aspect of APR remains underexplored. This study investigates pre-trained language models, CodeBERT and CodeT5, for automated vulnerability patching across six datasets and four languages. We… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Preprint has been accepted in ARES AI&CCPS (International Workshop on Artificial Intelligence, Cyber and Cyber-Physical Security)

  18. arXiv:2506.04368  [pdf, ps, other

    cs.DC

    Fully-Distributed Construction of Byzantine-Resilient Dynamic Peer-to-Peer Networks

    Authors: Aayush Gupta, Gopal Pandurangan

    Abstract: We address a fundamental problem in Peer-to-Peer (P2P) networks, namely, constructing and maintaining dynamic P2P overlay network topologies with essential properties such as connectivity, low diameter, and high expansion, that are resilient to continuous high churn and the presence of a large number of malicious (Byzantine) nodes. Our main goal is to construct and maintain a sparse (bounded degre… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  19. arXiv:2506.03148  [pdf, ps, other

    cs.CV

    Self-Supervised Spatial Correspondence Across Modalities

    Authors: Ayush Shrivastava, Andrew Owens

    Abstract: We present a method for finding cross-modal space-time correspondences. Given two images from different visual modalities, such as an RGB image and a depth map, our model identifies which pairs of pixels correspond to the same physical points in the scene. To solve this problem, we extend the contrastive random walk framework to simultaneously learn cycle-consistent feature representations for bot… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: CVPR 2025. Project link: https://www.ayshrv.com/cmrw . Code: https://github.com/ayshrv/cmrw

  20. arXiv:2506.02556  [pdf, ps, other

    cs.RO

    Sign Language: Towards Sign Understanding for Robot Autonomy

    Authors: Ayush Agrawal, Joel Loo, Nicky Zimmerman, David Hsu

    Abstract: Signage is an ubiquitous element of human environments, playing a critical role in both scene understanding and navigation. For autonomous systems to fully interpret human environments, effectively parsing and understanding signs is essential. We introduce the task of navigational sign understanding, aimed at extracting navigational cues from signs that convey symbolic spatial information about th… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  21. arXiv:2505.24603  [pdf, ps, other

    cs.LG

    The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches

    Authors: Omri Lev, Vishwak Srinivasan, Moshe Shenfeld, Katrina Ligett, Ayush Sekhari, Ashia C. Wilson

    Abstract: Gaussian sketching, which consists of pre-multiplying the data with a random Gaussian matrix, is a widely used technique for multiple problems in data science and machine learning, with applications spanning computationally efficient optimization, coded computing, and federated learning. This operation also provides differential privacy guarantees due to its inherent randomness. In this work, we r… ▽ More

    Submitted 4 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  22. arXiv:2505.24360  [pdf, ps, other

    cs.LG

    Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

    Authors: Stepan Shabalin, Ayush Panda, Dmitrii Kharlapenko, Abdur Raheem Ali, Yixiong Hao, Arthur Conmy

    Abstract: Sparse autoencoders are a promising new approach for decomposing language model activations for interpretation and control. They have been applied successfully to vision transformer image encoders and to small-scale diffusion models. Inference-Time Decomposition of Activations (ITDA) is a recently proposed variant of dictionary learning that takes the dictionary to be a set of data points from the… ▽ More

    Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 10 pages, 10 figures, Mechanistic Interpretability for Vision at CVPR 2025

  23. arXiv:2505.24063  [pdf

    cs.CL cs.DB

    TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine

    Authors: Jiacheng Xie, Yang Yu, Ziyang Zhang, Shuai Zeng, Jiaxuan He, Ayush Vasireddy, Xiaoting Tang, Congyu Guo, Lening Zhao, Congcong Jing, Guanghui An, Dong Xu

    Abstract: Traditional Chinese Medicine (TCM), as an effective alternative medicine, has been receiving increasing attention. In recent years, the rapid development of large language models (LLMs) tailored for TCM has underscored the need for an objective and comprehensive evaluation framework to assess their performance on real-world tasks. However, existing evaluation datasets are limited in scope and prim… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 22 pages, 4 figures

  24. arXiv:2505.23678  [pdf, ps, other

    cs.CV

    Grounded Reinforcement Learning for Visual Reasoning

    Authors: Gabriel Sarch, Snigdha Saha, Naitik Khandelwal, Ayush Jain, Michael J. Tarr, Aviral Kumar, Katerina Fragkiadaki

    Abstract: While reinforcement learning (RL) over chains of thought has significantly advanced language models in tasks such as mathematics and coding, visual reasoning introduces added complexity by requiring models to direct visual attention, interpret perceptual inputs, and ground abstract reasoning in spatial evidence. We introduce ViGoRL (Visually Grounded Reinforcement Learning), a vision-language mode… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Project website: https://visually-grounded-rl.github.io/

  25. arXiv:2505.22820  [pdf, ps, other

    cs.LG

    Preference Learning with Response Time

    Authors: Ayush Sawarni, Sahasrajit Sarmasarkar, Vasilis Syrgkanis

    Abstract: This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  26. arXiv:2505.18122  [pdf, ps, other

    cs.CL

    UNJOIN: Enhancing Multi-Table Text-to-SQL Generation via Schema Simplification

    Authors: Poojah Ganesan, Rajat Aayush Jha, Dan Roth, Vivek Gupta

    Abstract: Recent advances in large language models (LLMs) have greatly improved Text-to-SQL performance for single-table queries. But, it remains challenging in multi-table databases due to complex schema and relational operations. Existing methods often struggle with retrieving the right tables and columns, generating accurate JOINs and UNIONs, and generalizing across diverse schemas. To address these issu… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  27. arXiv:2505.17360  [pdf, ps, other

    cs.CC cs.DS

    The Quasi-Polynomial Low-Degree Conjecture is False

    Authors: Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, Pravesh K. Kothari

    Abstract: There is a growing body of work on proving hardness results for average-case estimation problems by bounding the low-degree advantage (LDA) - a quantitative estimate of the closeness of low-degree moments - between a null distribution and a related planted distribution. Such hardness results are now ubiquitous not only for foundational average-case problems but also central questions in statistics… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  28. arXiv:2505.16519  [pdf, ps, other

    cs.NI

    SONIC: Cost-Effective Web Access for Developing Countries

    Authors: Ayush Pandey, Rohail Asim, Jean Louis K. E. Fendji, Talal Rahwan, Matteo Varvello, Yasir Zaki

    Abstract: Over 2.6 billion people remain without access to the Internet in 2025. This phenomenon is especially pronounced in developing regions, where cost and infrastructure limitations are major barriers to connectivity. In response, we design SONIC, a low-cost, scalable data delivery system that builds on existing infrastructures: FM radio for downlink broadcasting, and SMS for personalized uplink. SONIC… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 16 pages, 20 figures

  29. arXiv:2505.16261  [pdf

    cs.CR

    Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models

    Authors: Kalindi Singh, Aayush Kashyap, Aswani Kumar Cherukuri

    Abstract: The widespread adoption of encrypted communication protocols such as HTTPS and TLS has enhanced data privacy but also rendered traditional anomaly detection techniques less effective, as they often rely on inspecting unencrypted payloads. This study aims to develop an interpretable machine learning-based framework for anomaly detection in encrypted network traffic. This study proposes a model-agno… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  30. arXiv:2505.15623  [pdf, ps, other

    cs.CL cs.LG

    Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning

    Authors: Tiasa Singha Roy, Aditeya Baral, Ayush Rajesh Jhaveri, Yusuf Baig

    Abstract: Large language models (LLMs) demonstrate considerable potential in various natural language tasks but face significant challenges in mathematical reasoning, particularly in executing precise, multi-step logic. However, current evaluation frameworks judge their performance solely based on accuracy, which only accounts for the final answer. This study explores these pitfalls by employing a novel eva… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  31. Physics-Guided Multi-View Graph Neural Network for Schizophrenia Classification via Structural-Functional Coupling

    Authors: Badhan Mazumder, Ayush Kanyal, Lei Wu, Vince D. Calhoun, Dong Hye Ye

    Abstract: Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted and presented at the 7th International Workshop on PRedictive Intelligence in MEdicine (Held in Conjunction with MICCAI 2024)

  32. arXiv:2505.15050  [pdf, ps, other

    cs.CL

    Improving the fact-checking performance of language models by relying on their entailment ability

    Authors: Gaurav Kumar, Debajyoti Mazumder, Ayush Garg, Jasabanta Patro

    Abstract: Automated fact-checking is a crucial task in this digital age. To verify a claim, current approaches majorly follow one of two strategies i.e. (i) relying on embedded knowledge of language models, and (ii) fine-tuning them with evidence pieces. While the former can make systems to hallucinate, the later have not been very successful till date. The primary reason behind this is that fact verificati… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 44 pages

  33. arXiv:2505.13777  [pdf, ps, other

    cs.CV cs.AI cs.SD

    Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping

    Authors: Subash Khanal, Srikumar Sastry, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs

    Abstract: We present Sat2Sound, a multimodal representation learning framework for soundscape mapping, designed to predict the distribution of sounds at any location on Earth. Existing methods for this task rely on satellite image and paired geotagged audio samples, which often fail to capture the diversity of sound sources at a given location. To address this limitation, we enhance existing datasets by lev… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  34. arXiv:2505.10746  [pdf, ps, other

    cs.CY cs.AI cs.CR cs.SI

    ChestyBot: Detecting and Disrupting Chinese Communist Party Influence Stratagems

    Authors: Matthew Stoffolano, Ayush Rout, Justin M. Pelletier

    Abstract: Foreign information operations conducted by Russian and Chinese actors exploit the United States' permissive information environment. These campaigns threaten democratic institutions and the broader Westphalian model. Yet, existing detection and mitigation strategies often fail to identify active information campaigns in real time. This paper introduces ChestyBot, a pragmatics-based language model… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: Presented at USCYBERCOM Cyber Recon Symposium 2023 at DreamPort in Columbia, MD on April 20, 2023

  35. arXiv:2505.09805  [pdf, ps, other

    q-bio.QM cs.AI cs.LG stat.AP

    Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

    Authors: Aditya Nagori, Ayush Gautam, Matthew O. Wiens, Vuong Nguyen, Nathan Kenya Mugisha, Jerome Kabakyenga, Niranjan Kissoon, John Mark Ansermino, Rishikesan Kamaleswaran

    Abstract: Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 record… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 11 pages, 2 Figures, 1 Table

  36. arXiv:2505.08561  [pdf, other

    cs.CV

    Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection

    Authors: Ayush K. Rai, Kyle Min, Tarun Krishna, Feiyan Hu, Alan F. Smeaton, Noel E. O'Connor

    Abstract: Masked video modeling~(MVM) has emerged as a highly effective pre-training strategy for visual foundation models, whereby the model reconstructs masked spatiotemporal tokens using information from visible tokens. However, a key challenge in such approaches lies in selecting an appropriate masking strategy. Previous studies have explored predefined masking techniques, including random and tube-base… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  37. arXiv:2505.05885  [pdf, ps, other

    cs.DB cs.IR

    Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

    Authors: Nitish Upreti, Krishnan Sundaram, Hari Sudan Sundar, Samer Boshra, Balachandar Perumalswamy, Shivam Atri, Martin Chisholm, Revti Raman Singh, Greg Yang, Subramanyam Pattipaka, Tamara Hass, Nitesh Dudhey, James Codella, Mark Hildebrand, Magdalen Manohar, Jack Moffitt, Haiyang Xu, Naren Datha, Suryansh Gupta, Ravishankar Krishnaswamy, Prashant Gupta, Abhishek Sahu, Ritika Mor, Santosh Kulkarni, Hemeswari Varada , et al. (11 additional authors not shown)

    Abstract: Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    ACM Class: H.3.3

  38. arXiv:2505.03839  [pdf, other

    cs.IR cs.CL

    An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification

    Authors: Utsav Kumar Nareti, Soumi Chattopadhyay, Prolay Mallick, Suraj Kumar, Ayush Vikas Daga, Chandranath Adak, Adarsh Wase, Arjab Roy

    Abstract: Identifying the finer details of a book's genres enhances user experience by enabling efficient book discovery and personalized recommendations, ultimately improving reader engagement and satisfaction. It also provides valuable insights into market trends and consumer preferences, allowing publishers and marketers to make data-driven decisions regarding book production and marketing strategies. Wh… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  39. arXiv:2505.03189  [pdf, other

    cs.AI cs.HC

    Patterns and Mechanisms of Contrastive Activation Engineering

    Authors: Yixiong Hao, Ayush Panda, Stepan Shabalin, Sheikh Abdur Raheem Ali

    Abstract: Controlling the behavior of Large Language Models (LLMs) remains a significant challenge due to their inherent complexity and opacity. While techniques like fine-tuning can modify model behavior, they typically require extensive computational resources. Recent work has introduced a class of contrastive activation engineering (CAE) techniques as promising approaches for steering LLM outputs through… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Published at the ICLR 2025 Bi-Align, HAIC, and Building Trust workshops

  40. arXiv:2505.03173  [pdf, other

    cs.CV cs.AI

    RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph

    Authors: Sameer Malik, Moyuru Yamada, Ayush Singh, Dishank Aggarwal

    Abstract: Comprehending long videos remains a significant challenge for Large Multi-modal Models (LMMs). Current LMMs struggle to process even minutes to hours videos due to their lack of explicit memory and retrieval mechanisms. To address this limitation, we propose RAVU (Retrieval Augmented Video Understanding), a novel framework for video understanding enhanced by retrieval with compositional reasoning… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  41. arXiv:2505.03098  [pdf, other

    eess.SP cs.IT

    USF Spectral Estimation: Prevalence of Gaussian Cramér-Rao Bounds Despite Modulo Folding

    Authors: Ruiming Guo, Ayush Bhandari

    Abstract: Spectral Estimation (SpecEst) is a core area of signal processing with a history spanning two centuries and applications across various fields. With the advent of digital acquisition, SpecEst algorithms have been widely applied to tasks like frequency super-resolution. However, conventional digital acquisition imposes a trade-off: for a fixed bit budget, one can optimize either signal dynamic rang… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 2 Figs, to appear in Proc. of 2025 IEEE Statistical Signal Processing (SSP) Workshop

  42. arXiv:2505.01700  [pdf, other

    cs.LG q-bio.QM

    PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking

    Authors: Yize Jiang, Xinze Li, Yuanyuan Zhang, Jin Han, Youjun Xu, Ayush Pandit, Zaixi Zhang, Mengdi Wang, Mengyang Wang, Chong Liu, Guang Yang, Yejin Choi, Wu-Jun Li, Tianfan Fu, Fang Wu, Junhong Liu

    Abstract: Existing protein-ligand docking studies typically focus on the self-docking scenario, which is less practical in real applications. Moreover, some studies involve heavy frameworks requiring extensive training, posing challenges for convenient and efficient assessment of docking methods. To fill these gaps, we design PoseX, an open-source benchmark to evaluate both self-docking and cross-docking, e… ▽ More

    Submitted 21 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

  43. arXiv:2504.19395  [pdf, other

    cs.CL

    ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers

    Authors: Zhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu, Daniel Khashabi

    Abstract: Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning'' from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cr… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

  44. arXiv:2504.17950  [pdf, other

    cs.MA cs.CL

    Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning

    Authors: Isadora White, Kolby Nottingham, Ayush Maniar, Max Robinson, Hansen Lillemark, Mehul Maheshwari, Lianhui Qin, Prithviraj Ammanabrolu

    Abstract: Collaboration is ubiquitous and essential in day-to-day life -- from exchanging ideas, to delegating tasks, to generating plans together. This work studies how LLMs can adaptively collaborate to perform complex embodied reasoning tasks. To this end we introduce MINDcraft, an easily extensible platform built to enable LLM agents to control characters in the open-world game of Minecraft; and MineCol… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 9 pages of main paper with 6 main figures, overall 28 pages

  45. arXiv:2504.17656  [pdf, ps, other

    cs.CE cond-mat.mtrl-sci cs.LG

    polyGen: A Learning Framework for Atomic-level Polymer Structure Generation

    Authors: Ayush Jain, Rampi Ramprasad

    Abstract: Synthetic polymeric materials underpin fundamental technologies in the energy, electronics, consumer goods, and medical sectors, yet their development still suffers from prolonged design timelines. Although polymer informatics tools have supported speedup, polymer simulation protocols continue to face significant challenges in the on-demand generation of realistic 3D atomic structures that respect… ▽ More

    Submitted 10 June, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

  46. arXiv:2504.17140  [pdf, other

    cs.LG cs.AI

    Scalable Permutation-Aware Modeling for Temporal Set Prediction

    Authors: Ashish Ranjan, Ayush Agarwal, Shalin Barot, Sushant Kumar

    Abstract: Temporal set prediction involves forecasting the elements that will appear in the next set, given a sequence of prior sets, each containing a variable number of elements. Existing methods often rely on intricate architectures with substantial computational overhead, which hampers their scalability. In this work, we introduce a novel and scalable framework that leverages permutation-equivariant and… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  47. arXiv:2504.14151  [pdf, other

    cs.CV cs.AI cs.RO

    Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

    Authors: Sergio Arnaud, Paul McVay, Ada Martin, Arjun Majumdar, Krishna Murthy Jatavallabhula, Phillip Thomas, Ruslan Partsey, Daniel Dugas, Abha Gejji, Alexander Sax, Vincent-Pierre Berges, Mikael Henaff, Ayush Jain, Ang Cao, Ishita Prasad, Mrinal Kalakrishnan, Michael Rabbat, Nicolas Ballas, Mido Assran, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

    Abstract: We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world depl… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    ACM Class: I.2.10; I.2.6; I.2.9; I.3.7; I.4.6; I.4.8

  48. arXiv:2504.12515  [pdf, other

    cs.CV

    Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space

    Authors: Kaustav Chanda, Aayush Atul Verma, Arpitsinh Vaghela, Yezhou Yang, Bharatesh Chakravarthi

    Abstract: Event cameras promise a paradigm shift in vision sensing with their low latency, high dynamic range, and asynchronous nature of events. Unfortunately, the scarcity of high-quality labeled datasets hinders their widespread adoption in deep learning-driven computer vision. To mitigate this, several simulators have been proposed to generate synthetic event data for training models for detection and e… ▽ More

    Submitted 20 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Accepted at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Fifth International Workshop on Event-Based Vision

  49. arXiv:2504.12389  [pdf, other

    quant-ph cs.LG

    Predictive control of blast furnace temperature in steelmaking with hybrid depth-infused quantum neural networks

    Authors: Nayoung Lee, Minsoo Shin, Asel Sagingalieva, Ayush Joshi Tripathi, Karan Pinto, Alexey Melnikov

    Abstract: Accurate prediction and stabilization of blast furnace temperatures are crucial for optimizing the efficiency and productivity of steel production. Traditional methods often struggle with the complex and non-linear nature of the temperature fluctuations within blast furnaces. This paper proposes a novel approach that combines hybrid quantum machine learning with pulverized coal injection control t… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  50. arXiv:2504.11673  [pdf, ps, other

    cs.CL

    Deep Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

    Authors: Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan

    Abstract: Large language models (LLMs) are increasingly capable of simulating human behavior, offering cost-effective ways to estimate user responses to various surveys and polls. However, the questions in these surveys usually reflect socially understood attitudes: the patterns of attitudes of old/young, liberal/conservative, as understood by both members and non-members of those groups. It is not clear wh… ▽ More

    Submitted 12 June, 2025; v1 submitted 15 April, 2025; originally announced April 2025.