Skip to main content

Showing 1–50 of 458 results for author: Agarwal, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.02972  [pdf, ps, other

    cs.CV cs.LG

    Farm-Level, In-Season Crop Identification for India

    Authors: Ishan Deshpande, Amandeep Kaur Reehal, Chandan Nath, Renu Singh, Aayush Patel, Aishwarya Jayagopal, Gaurav Singh, Gaurav Aggarwal, Amit Agarwal, Prathmesh Bele, Sridhar Reddy, Tanya Warrier, Kinjal Singh, Ashish Tendulkar, Luis Pazos Outon, Nikita Saxena, Agata Dondzik, Dinesh Tewari, Shruti Garg, Avneet Singh, Harsh Dhand, Vaibhav Rajan, Alok Talekar

    Abstract: Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restri… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

  2. arXiv:2506.16678  [pdf, ps, other

    cs.CL

    Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations

    Authors: Ananth Agarwal, Jasper Jian, Christopher D. Manning, Shikhar Murty

    Abstract: Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the precise mechanism by which they represent syntactic structure is an open area within interpretability research. Probing provides one way to identify the mechanism of syntax being linearly encoded in… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  3. arXiv:2506.10910  [pdf, ps, other

    cs.CL

    Magistral

    Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

    Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  4. arXiv:2506.08928  [pdf, ps, other

    cs.LG stat.ME stat.ML

    Local MDI+: Local Feature Importances for Tree-Based Models

    Authors: Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

    Abstract: Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific)… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  5. arXiv:2506.07949  [pdf, ps, other

    cs.LG

    Cost-Optimal Active AI Model Evaluation

    Authors: Anastasios N. Angelopoulos, Jacob Eisenstein, Jonathan Berant, Alekh Agarwal, Adam Fisch

    Abstract: The development lifecycle of generative AI systems requires continual evaluation, data acquisition, and annotation, which is costly in both resources and time. In practice, rapid iteration often makes it necessary to rely on synthetic annotation data because of the low cost, despite the potential for substantial bias. In this paper, we develop novel, cost-aware methods for actively balancing the u… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  6. arXiv:2506.04166  [pdf, ps, other

    cs.LG stat.CO stat.ML

    N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

    Authors: Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi

    Abstract: Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications.… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 21 pages, 6 figures

  7. arXiv:2506.02097  [pdf, ps, other

    cs.AI

    Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

    Authors: Priyaranjan Pattnayak, Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Srikant Panda

    Abstract: Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated dom… ▽ More

    Submitted 25 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: Proceedings of the 4th International Workshop on Knowledge Augmented Methods for Natural Language Processing in NAACL 2025, pages 215 to 229, Albuquerque, New Mexico, USA. Association for Computational Linguistics

    Journal ref: Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing (KnowledgeNLP 2025), pp. 215 to 229, Association for Computational Linguistics, Albuquerque, New Mexico, May 2025

  8. arXiv:2506.00482  [pdf, ps, other

    cs.LG cs.AI cs.CL

    BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

    Authors: Eunsu Kim, Haneul Yoo, Guijin Son, Hitesh Patel, Amit Agarwal, Alice Oh

    Abstract: As large language models (LLMs) continue to advance, the need for up-to-date and well-organized benchmarks becomes increasingly critical. However, many existing datasets are scattered, difficult to manage, and make it challenging to perform evaluations tailored to specific needs or domains, despite the growing importance of domain-specific models in areas such as math or code. In this paper, we in… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  9. arXiv:2505.18366  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.LG

    Hard Negative Mining for Domain-Specific Retrieval in Enterprise Systems

    Authors: Hansa Meghwani, Amit Agarwal, Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Srikant Panda

    Abstract: Enterprise search systems often struggle to retrieve accurate, domain-specific information due to semantic mismatches and overlapping terminologies. These issues can degrade the performance of downstream applications such as knowledge management, customer support, and retrieval-augmented generation agents. To address this challenge, we propose a scalable hard-negative mining framework tailored spe… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted to ACL 2025

    ACM Class: H.3.3; I.2.6; I.2.7

  10. arXiv:2505.18149  [pdf, ps, other

    cs.CL

    First Finish Search: Efficient Test-Time Scaling in Large Language Models

    Authors: Aradhye Agarwal, Ayan Sengupta, Tanmoy Chakraborty

    Abstract: Test-time scaling (TTS), which involves dynamic allocation of compute during inference, offers a promising way to improve reasoning in large language models. While existing TTS methods work well, they often rely on long decoding paths or require a large number of samples to be generated, increasing the token usage and inference latency. We observe the surprising fact that for reasoning tasks, shor… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  11. arXiv:2505.17495  [pdf, ps, other

    cs.LG cs.AI cs.CL

    ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

    Authors: Landon Butler, Abhineet Agarwal, Justin Singh Kang, Yigit Efe Erginbas, Bin Yu, Kannan Ramchandran

    Abstract: Large Language Models (LLMs) have achieved remarkable performance by capturing complex interactions between input features. To identify these interactions, most existing approaches require enumerating all possible combinations of features up to a given order, causing them to scale poorly with the number of inputs $n$. Recently, Kang et al. (2025) proposed SPEX, an information-theoretic approach th… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  12. arXiv:2505.17332  [pdf, ps, other

    cs.CL cs.AI cs.LG cs.MA

    SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

    Authors: Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae

    Abstract: Enterprise customers are increasingly adopting Large Language Models (LLMs) for critical communication tasks, such as drafting emails, crafting sales pitches, and composing casual messages. Deploying such models across different regions requires them to understand diverse cultural and linguistic contexts and generate safe and respectful responses. For enterprise applications, it is crucial to miti… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Published in the Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025), Industry Track, pages 558-582

    ACM Class: I.2.7; I.2.6

  13. arXiv:2505.17330  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding

    Authors: Amit Agarwal, Srikant Panda, Kulbhushan Pachauri

    Abstract: In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR er… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Published in the Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Industry Track, pages 100-114

    ACM Class: I.2.7; I.5.4; I.7

  14. arXiv:2505.11976  [pdf

    cs.CV

    Advanced Integration of Discrete Line Segments in Digitized P&ID for Continuous Instrument Connectivity

    Authors: Soumya Swarup Prusty, Astha Agarwal, Srinivasan Iyenger

    Abstract: Piping and Instrumentation Diagrams (P&IDs) constitute the foundational blueprint of a plant, depicting the interconnections among process equipment, instrumentation for process control, and the flow of fluids and control signals. In their existing setup, the manual mapping of information from P&ID sheets holds a significant challenge. This is a time-consuming process, taking around 3-6 months, an… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 6 pages, 13 figures

  15. arXiv:2505.08784  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

    Authors: Abhineet Agarwal, Michael Xiao, Rebecca Barter, Omer Ronen, Boyu Fan, Bin Yu

    Abstract: As machine learning (ML) models are increasingly deployed in high-stakes domains, trustworthy uncertainty quantification (UQ) is critical for ensuring the safety and reliability of these models. Traditional UQ methods rely on specifying a true generative model and are not robust to misspecification. On the other hand, conformal inference allows for arbitrary ML models but does not consider model s… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  16. arXiv:2505.03155  [pdf, ps, other

    cs.LG

    Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

    Authors: Max Qiushi Lin, Jincheng Mei, Matin Aghaei, Michael Lu, Bo Dai, Alekh Agarwal, Dale Schuurmans, Csaba Szepesvari, Sharan Vaswani

    Abstract: Policy gradient (PG) methods have played an essential role in the empirical successes of reinforcement learning. In order to handle large state-action spaces, PG methods are typically used with function approximation. In this setting, the approximation error in modeling problem-dependent quantities is a key notion for characterizing the global convergence of PG methods. We focus on Softmax PG with… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 75 pages

  17. arXiv:2505.01928  [pdf, other

    cs.CV

    GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting

    Authors: Anushka Agarwal, Muhammad Yusuf Hassan, Talha Chafekar

    Abstract: We introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio repr… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  18. arXiv:2504.20519  [pdf

    cs.CY cs.HC

    Conversations with AI Chatbots Increase Short-Term Vaccine Intentions But Do Not Outperform Standard Public Health Messaging

    Authors: Neil K. R. Sehgal, Sunny Rai, Manuel Tonneau, Anish K. Agarwal, Joseph Cappella, Melanie Kornides, Lyle Ungar, Alison Buttenheim, Sharath Chandra Guntuku

    Abstract: Large language model (LLM) based chatbots show promise in persuasive communication, but existing studies often rely on weak controls or focus on belief change rather than behavioral intentions or outcomes. This pre-registered multi-country (US, Canada, UK) randomized controlled trial involving 930 vaccine-hesitant parents evaluated brief (three-minute) multi-turn conversations with LLM-based chatb… ▽ More

    Submitted 26 June, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  19. arXiv:2504.19470  [pdf, ps, other

    quant-ph cs.CC

    A Cautionary Note on Quantum Oracles

    Authors: Avantika Agarwal, Srijita Kundu

    Abstract: In recent years, the quantum oracle model introduced by Aaronson and Kuperberg (2007) has found a lot of use in showing oracle separations between complexity classes and cryptographic primitives. It is generally assumed that proof techniques that do not relativize with respect to quantum oracles will also not relativize with respect to classical oracles. In this note, we show that this is not the… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  20. arXiv:2504.18786  [pdf, ps, other

    cs.NI

    Contracts: A unified lens on congestion control robustness, fairness, congestion, and generality

    Authors: Anup Agarwal, Venkat Arun, Srinivasan Seshan

    Abstract: Congestion control algorithms (CCAs) operate in partially observable environments, lacking direct visibility into link capacities, or competing flows. To ensure fair sharing of network resources, CCAs communicate their fair share through observable signals. For instance, Reno's fair share is encoded as $\propto 1/\sqrt{\texttt{loss rate}}$. We call such communication mechanisms \emph{contracts}. W… ▽ More

    Submitted 6 June, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

  21. arXiv:2504.17140  [pdf, other

    cs.LG cs.AI

    Scalable Permutation-Aware Modeling for Temporal Set Prediction

    Authors: Ashish Ranjan, Ayush Agarwal, Shalin Barot, Sushant Kumar

    Abstract: Temporal set prediction involves forecasting the elements that will appear in the next set, given a sequence of prior sets, each containing a variable number of elements. Existing methods often rely on intricate architectures with substantial computational overhead, which hampers their scalability. In this work, we introduce a novel and scalable framework that leverages permutation-equivariant and… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  22. arXiv:2504.16977  [pdf, other

    cs.CL cs.AI

    Tokenization Matters: Improving Zero-Shot NER for Indic Languages

    Authors: Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Amit Agarwal

    Abstract: Tokenization is a critical component of Natural Language Processing (NLP), especially for low resource languages, where subword segmentation influences vocabulary structure and downstream task accuracy. Although Byte Pair Encoding (BPE) is a standard tokenization method in multilingual language models, its suitability for Named Entity Recognition (NER) in low resource Indic languages remains under… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  23. A Modularized Design Approach for GelSight Family of Vision-based Tactile Sensors

    Authors: Arpit Agarwal, Mohammad Amin Mirzaee, Xiping Sun, Wenzhen Yuan

    Abstract: GelSight family of vision-based tactile sensors has proven to be effective for multiple robot perception and manipulation tasks. These sensors are based on an internal optical system and an embedded camera to capture the deformation of the soft sensor surface, inferring the high-resolution geometry of the objects in contact. However, customizing the sensors for different robot hands requires a ted… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: The paper is accepted to International Journal of Robotics Research with DOI 10.1177/02783649251339680

  24. arXiv:2504.13776  [pdf, other

    cs.CV eess.IV

    Fighting Fires from Space: Leveraging Vision Transformers for Enhanced Wildfire Detection and Characterization

    Authors: Aman Agarwal, James Gearon, Raksha Rank, Etienne Chenevert

    Abstract: Wildfires are increasing in intensity, frequency, and duration across large parts of the world as a result of anthropogenic climate change. Modern hazard detection and response systems that deal with wildfires are under-equipped for sustained wildfire seasons. Recent work has proved automated wildfire detection using Convolutional Neural Networks (CNNs) trained on satellite imagery are capable of… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  25. arXiv:2504.10404  [pdf, other

    cs.HC

    Framing Perception: Exploring Camera Induced Objectification in Cinema

    Authors: Parth Maradia, Ayushi Agarwal, Srija Bhupathiraju, Kavita Vemuri

    Abstract: This study investigates how cinematographic techniques influence viewer perception and contribute to the objectification of women, utilizing eye-tracking data from 91 participants. They watched a sexualized music video (SV) known for objectifying portrayals and a non-sexualized music video (TV). Using dynamic Areas of Interests (AOIs) (head, torso, and lower body), gaze metrics such as fixation du… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  26. Enhancements for Developing a Comprehensive AI Fairness Assessment Standard

    Authors: Avinash Agarwal, Mayashankar Kumar, Manisha J. Nene

    Abstract: As AI systems increasingly influence critical sectors like telecommunications, finance, healthcare, and public services, ensuring fairness in decision-making is essential to prevent biased or unjust outcomes that disproportionately affect vulnerable entities or result in adverse impacts. This need is particularly pressing as the industry approaches the 6G era, where AI will drive complex functions… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 5 pages. Published in 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS). Access: https://ieeexplore.ieee.org/abstract/document/10885551

    Journal ref: 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS), Bengaluru, India, 2025, pp. 1216-1220

  27. arXiv:2504.06581  [pdf, other

    cs.AI

    Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

    Authors: Umakanta Maharana, Sarthak Verma, Avarna Agarwal, Prakashini Mruthyunjaya, Dwarikanath Mahapatra, Sakir Ahmed, Murari Mandal

    Abstract: Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonge… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  28. arXiv:2504.02130  [pdf, other

    cs.LG

    Ordering-based Conditions for Global Convergence of Policy Gradient Methods

    Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans

    Abstract: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. textcolor{blue}{First}, we establish a few key observations that frame the study: \textbf{(i)} Global convergence can be achieved under linear function approximation without policy or r… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: arXiv version for the NeurIPS 2023 paper; to be updated for a technical issue

  29. arXiv:2504.01702  [pdf, ps, other

    econ.EM cs.LG stat.ME

    A Causal Inference Framework for Data Rich Environments

    Authors: Alberto Abadie, Anish Agarwal, Devavrat Shah

    Abstract: We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential out… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  30. arXiv:2503.22634  [pdf, other

    cs.RO cs.AI

    Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels

    Authors: Adam Wei, Abhinav Agarwal, Boyuan Chen, Rohan Bosworth, Nicholas Pfaff, Russ Tedrake

    Abstract: In imitation learning for robotics, cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. This work seeks to elucidate basic principles of this sim-and-real cotraining to help inform simulation design, sim-and-real dataset creation, and policy training. Focusing narrowly on the canonical task of planar pu… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 9 pages, 15 figures, In Submission to IROS 2025

  31. arXiv:2503.13521  [pdf, other

    cs.DB cs.CY physics.soc-ph stat.AP

    States of Disarray: Cleaning Data for Gerrymandering Analysis

    Authors: Ananya Agarwal, Fnu Alusi, Arbie Hsu, Arif Syraj, Ellen Veomett

    Abstract: The mathematics of redistricting is an area of study that has exploded in recent years. In particular, many different research groups and expert witnesses in court cases have used outlier analysis to argue that a proposed map is a gerrymander. This outlier analysis relies on having an ensemble of potential redistricting maps against which the proposed map is compared. Arguably the most widely-acce… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 12 pages, 3 figures

    MSC Class: 51-11 (Primary) 68V35 (Secondary) ACM Class: E.m; J.4

  32. arXiv:2503.07920  [pdf, other

    cs.CV cs.AI cs.CL

    Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

    Authors: Samuel Cahyawijaya, Holy Lovenia, Joel Ruben Antony Moniz, Tack Hwa Wong, Mohammad Rifqi Farhansyah, Thant Thiri Maung, Frederikus Hudi, David Anugraha, Muhammad Ravi Shulthan Habibi, Muhammad Reza Qorib, Amit Agarwal, Joseph Marvin Imperial, Hitesh Laxmichand Patel, Vicky Feliren, Bahrul Ilmi Nasution, Manuel Antonio Rufino, Genta Indra Winata, Rian Adam Rajagede, Carlos Rafael Catalan, Mohamed Fazli Imam, Priyaranjan Pattnayak, Salsabila Zahirah Pranida, Kevin Pratama, Yeshil Bangera, Adisai Na-Thalang , et al. (67 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA… ▽ More

    Submitted 18 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: [SEA-VL Dataset] https://huggingface.co/collections/SEACrowd/sea-vl-multicultural-vl-dataset-for-southeast-asia-67cf223d0c341d4ba2b236e7 [Appendix J] https://github.com/SEACrowd/seacrowd.github.io/blob/master/docs/SEA_VL_Appendix_J.pdf

  33. arXiv:2503.06810  [pdf, other

    cs.LG cs.AI

    Mitigating Preference Hacking in Policy Optimization with Pessimism

    Authors: Dhawal Gupta, Adam Fisch, Christoph Dann, Alekh Agarwal

    Abstract: This work tackles the problem of overoptimization in reinforcement learning from human feedback (RLHF), a prevalent technique for aligning models with human preferences. RLHF relies on reward or preference models trained on \emph{fixed preference datasets}, and these models are unreliable when evaluated outside the support of this preference data, leading to the common reward or preference hacking… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  34. arXiv:2503.06730  [pdf, other

    cs.LG

    Adaptive Test-Time Intervention for Concept Bottleneck Models

    Authors: Matthew Shen, Aliyah Hsu, Abhineet Agarwal, Bin Yu

    Abstract: Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-… ▽ More

    Submitted 14 April, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  35. arXiv:2503.06469  [pdf, other

    cs.CV

    Vector Quantized Feature Fields for Fast 3D Semantic Lifting

    Authors: George Tang, Aditya Agarwal, Weiqiao Han, Trevor Darrell, Yutong Bai

    Abstract: We generalize lifting to semantic lifting by incorporating per-view masks that indicate relevant pixels for lifting tasks. These masks are determined by querying corresponding multiscale pixel-aligned feature maps, which are derived from scene representations such as distilled feature fields and feature point clouds. However, storing per-view feature maps rendered from distilled feature fields is… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  36. arXiv:2503.03750  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

    Authors: Richard Ren, Arunim Agarwal, Mantas Mazeika, Cristina Menghini, Robert Vacareanu, Brad Kenstler, Mick Yang, Isabelle Barrass, Alice Gatti, Xuwang Yin, Eduardo Trevino, Matias Geralnik, Adam Khoja, Dean Lee, Summer Yue, Dan Hendrycks

    Abstract: As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of "honesty" in LLMs, along with interventions aimed at mitigating deceptive behaviors. Howeve… ▽ More

    Submitted 20 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: Website: https://www.mask-benchmark.ai

  37. arXiv:2502.15950  [pdf, other

    cs.LG cs.CL

    Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models

    Authors: Lior Belenki, Alekh Agarwal, Tianze Shi, Kristina Toutanova

    Abstract: We propose a method to optimize language model pre-training data mixtures through efficient approximation of the cross-entropy loss corresponding to each candidate mixture via a Mixture of Data Experts (MDE). We use this approximation as a source of additional features in a regression model, trained from observations of model loss for a small number of mixtures. Experiments with Transformer deco… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  38. arXiv:2502.13870  [pdf, other

    cs.LG cs.AI cs.CL cs.IT

    SPEX: Scaling Feature Interaction Explanations for LLMs

    Authors: Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Kannan Ramchandran, Bin Yu

    Abstract: Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attr… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  39. arXiv:2502.13108  [pdf, other

    cs.CL cs.AI cs.LG

    Clinical QA 2.0: Multi-Task Learning for Answer Extraction and Categorization

    Authors: Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Amit Agarwal, Bhargava Kumar, Srikant Panda, Tejaswini Kumar

    Abstract: Clinical Question Answering (CQA) plays a crucial role in medical decision-making, enabling physicians to extract relevant information from Electronic Medical Records (EMRs). While transformer-based models such as BERT, BioBERT, and ClinicalBERT have demonstrated state-of-the-art performance in CQA, existing models lack the ability to categorize extracted answers, which is critical for structured… ▽ More

    Submitted 23 April, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  40. arXiv:2502.08177  [pdf, other

    cs.AI

    SycEval: Evaluating LLM Sycophancy

    Authors: Aaron Fanous, Jacob Goldberg, Ank A. Agarwal, Joanna Lin, Anson Zhou, Roxana Daneshjou, Sanmi Koyejo

    Abstract: Large language models (LLMs) are increasingly applied in educational, clinical, and professional settings, but their tendency for sycophancy -- prioritizing user agreement over independent reasoning -- poses risks to reliability. This study introduces a framework to evaluate sycophantic behavior in ChatGPT-4o, Claude-Sonnet, and Gemini-1.5-Pro across AMPS (mathematics) and MedQuad (medical advice)… ▽ More

    Submitted 5 March, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: 10 pages

  41. arXiv:2502.07141  [pdf, other

    cs.LG

    Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

    Authors: Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant Raj, Csaba Szepesvari, Dale Schuurmans

    Abstract: We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Updated version for a paper published at NeurIPS 2024

  42. arXiv:2502.06861  [pdf, other

    cs.LG cs.AI

    Design Considerations in Offline Preference-based RL

    Authors: Alekh Agarwal, Christoph Dann, Teodor V. Marinov

    Abstract: Offline algorithms for Reinforcement Learning from Human Preferences (RLHF), which use only a fixed dataset of sampled responses given an input, and preference feedback among these responses, have gained increasing prominence in the literature on aligning language models. In this paper, we study how the different design choices made in methods such as DPO, IPO, SLiC and many variants influence the… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  43. arXiv:2502.02486  [pdf, ps, other

    stat.ML cs.LG

    Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

    Authors: Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang

    Abstract: Typical contextual bandit algorithms assume that the rewards at each round lie in some fixed range $[0, R]$, and their regret scales polynomially with this reward range $R$. However, many practical scenarios naturally involve heavy-tailed rewards or rewards where the worst-case range can be substantially larger than the variance. In this paper, we develop an algorithmic approach building on Catoni… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  44. arXiv:2501.17767  [pdf, other

    cs.CL cs.AI

    Hybrid Graphs for Table-and-Text based Question Answering using LLMs

    Authors: Ankush Agarwal, Ganesh S, Chaitanya Devaguptapu

    Abstract: Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-sour… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: Accepted at NAACL 2025 Main Track

  45. Standardised schema and taxonomy for AI incident databases in critical digital infrastructure

    Authors: Avinash Agarwal, Manisha J. Nene

    Abstract: The rapid deployment of Artificial Intelligence (AI) in critical digital infrastructure introduces significant risks, necessitating a robust framework for systematically collecting AI incident data to prevent future incidents. Existing databases lack the granularity as well as the standardized structure required for consistent data collection and analysis, impeding effective incident management. T… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: 6 pages, 3 tables. Accepted at the 2024 IEEE Pune Section International Conference (PuneCon)

    Journal ref: IEEE Pune Section International Conference (PuneCon), Pune, India, 2024, pp. 1-6

  46. Advancing Trustworthy AI for Sustainable Development: Recommendations for Standardising AI Incident Reporting

    Authors: Avinash Agarwal, Manisha J Nene

    Abstract: The increasing use of AI technologies has led to increasing AI incidents, posing risks and causing harm to individuals, organizations, and society. This study recognizes and addresses the lack of standardized protocols for reliably and comprehensively gathering such incident data crucial for preventing future incidents and developing mitigating strategies. Specifically, this study analyses existin… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 8 pages, 10 tables, and 1 figure. Accepted at the International Telecommunication Union (ITU) Kaleidoscope 2024

    Journal ref: 2024 ITU Kaleidoscope: Innovation and Digital Transformation for a Sustainable World (ITU K), New Delhi, India, 2024, pp. 1-8

  47. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  48. AI Technicians: Developing Rapid Occupational Training Methods for a Competitive AI Workforce

    Authors: Jaromir Savelka, Can Kultur, Arav Agarwal, Christopher Bogart, Heather Burte, Adam Zhang, Majd Sakr

    Abstract: The accelerating pace of developments in Artificial Intelligence~(AI) and the increasing role that technology plays in society necessitates substantial changes in the structure of the workforce. Besides scientists and engineers, there is a need for a very large workforce of competent AI technicians (i.e., maintainers, integrators) and users~(i.e., operators). As traditional 4-year and 2-year degre… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

  49. arXiv:2412.19794  [pdf, ps, other

    cs.CV

    MVTamperBench: Evaluating Robustness of Vision-Language Models

    Authors: Amit Agarwal, Srikant Panda, Angeline Charles, Bhargava Kumar, Hitesh Patel, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Hansa Meghwani, Karan Gupta, Dong-Kyu Chae

    Abstract: Multimodal Large Language Models (MLLMs), are recent advancement of Vision-Language Models (VLMs) that have driven major advances in video understanding. However, their vulnerability to adversarial tampering and manipulations remains underexplored. To address this gap, we introduce \textbf{MVTamperBench}, a benchmark that systematically evaluates MLLM robustness against five prevalent tampering te… ▽ More

    Submitted 11 June, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

    MSC Class: 68T37; 68T05; 68Q32; 68T45; 94A08; 68T40; 68Q85 ACM Class: I.2.10; I.2.7; I.5.4; I.4.9; I.4.8; H.5.1

  50. arXiv:2412.17759  [pdf, other

    cs.AI cs.CV cs.LG

    Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

    Authors: Priyaranjan Pattnayak, Hitesh Laxmichand Patel, Bhargava Kumar, Amit Agarwal, Ishan Banerjee, Srikant Panda, Tejaswini Kumar

    Abstract: Multimodal learning, a rapidly evolving field in artificial intelligence, seeks to construct more versatile and robust systems by integrating and analyzing diverse types of data, including text, images, audio, and video. Inspired by the human ability to assimilate information through many senses, this method enables applications such as text-to-video conversion, visual question answering, and imag… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.