Skip to main content

Showing 1–50 of 100 results for author: Bedi, S

.
  1. arXiv:2506.06574  [pdf, ps, other

    cs.AI cs.MA

    The Optimization Paradox in Clinical AI Multi-Agent Systems

    Authors: Suhana Bedi, Iddah Mlauzi, Daniel Shin, Sanmi Koyejo, Nigam H. Shah

    Abstract: Multi-agent artificial intelligence systems are increasingly deployed in clinical settings, yet the relationship between component-level optimization and system-wide performance remains poorly understood. We evaluated this relationship using 2,400 real patient cases from the MIMIC-CDM dataset across four abdominal pathologies (appendicitis, pancreatitis, cholecystitis, diverticulitis), decomposing… ▽ More

    Submitted 11 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

  2. arXiv:2506.04210  [pdf, ps, other

    cs.AI cs.CL

    Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Yifu Lu, Mengdi Wang, Dinesh Manocha, Furong Huang, Mohammad Ghavamzadeh, Amrit Singh Bedi

    Abstract: Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural question: Does thinking more at test-time truly lead to better reasoning? To answer this question, we perform a detailed empirical study across models and bench… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  3. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  4. arXiv:2505.23729  [pdf, ps, other

    cs.CL cs.AI

    Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

    Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi

    Abstract: Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives w… ▽ More

    Submitted 31 May, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025

  5. arXiv:2505.18344  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access

    Authors: Mudit Gaur, Prashant Trivedi, Sasidhar Kunapuli, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimatio… ▽ More

    Submitted 8 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  6. arXiv:2505.11462  [pdf, other

    cs.CL cs.AI

    Disentangling Reasoning and Knowledge in Medical Large Language Models

    Authors: Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human perfor… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  7. arXiv:2505.10573  [pdf, ps, other

    cs.CY cs.LG

    Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

    Authors: Olawale Salaudeen, Anka Reuel, Ahmed Ahmed, Suhana Bedi, Zachary Robertson, Sudharsan Sundar, Ben Domingue, Angelina Wang, Sanmi Koyejo

    Abstract: While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow benchmarks, like performance on graduate-level exam questions, which provide a limited and potentially misleading assessment. We provide a structured approach fo… ▽ More

    Submitted 7 June, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

    Comments: Correspondence to [email protected]

  8. arXiv:2504.01931  [pdf, other

    cs.CL

    Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

    Authors: Souradip Chakraborty, Mohammadreza Pourreza, Ruoxi Sun, Yiwen Song, Nino Scherrer, Furong Huang, Amrit Singh Bedi, Ahmad Beirami, Jindong Gu, Hamid Palangi, Tomas Pfister

    Abstract: While AI agents have shown remarkable performance at various tasks, they still struggle with complex multi-modal applications, structured generation and strategic planning. Improvements via standard fine-tuning is often impractical, as solving agentic tasks usually relies on black box API access without control over model parameters. Inference-time methods such as Best-of-N (BON) sampling offer a… ▽ More

    Submitted 5 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  9. arXiv:2503.18816  [pdf, other

    cs.RO cs.AI

    Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

    Authors: Chak Lam Shek, Amrit Singh Bedi, Anjon Basak, Ellen Novoseller, Nick Waytowich, Priya Narayanan, Dinesh Manocha, Pratap Tokekar

    Abstract: In this work, we present a novel cooperative multi-agent reinforcement learning method called \textbf{Loc}ality based \textbf{Fac}torized \textbf{M}ulti-Agent \textbf{A}ctor-\textbf{C}ritic (Loc-FACMAC). Existing state-of-the-art algorithms, such as FACMAC, rely on global reward information, which may not accurately reflect the quality of individual robots' actions in decentralized systems. We int… ▽ More

    Submitted 28 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  10. arXiv:2503.17644  [pdf, ps, other

    cs.LG cs.AI

    On The Sample Complexity Bounds In Bilevel Reinforcement Learning

    Authors: Mudit Gaur, Utsav Singh, Amrit Singh Bedi, Raghu Pasupathu, Vaneet Aggarwal

    Abstract: Bilevel reinforcement learning (BRL) has emerged as a powerful framework for aligning generative models, yet its theoretical foundations, especially sample complexity bounds, remain underexplored. In this work, we present the first sample complexity bound for BRL, establishing a rate of $\mathcal{O}(ε^{-3})$ in continuous state-action spaces. Traditional MDP analysis techniques do not extend to BR… ▽ More

    Submitted 5 June, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: This is updated version of the paper 2410.15610

  11. arXiv:2503.12575  [pdf, other

    cs.CV cs.AI

    BalancedDPO: Adaptive Multi-Metric Alignment

    Authors: Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO)… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2501.06007  [pdf

    cs.LG

    CoNOAir: A Neural Operator for Forecasting Carbon Monoxide Evolution in Cities

    Authors: Sanchit Bedi, Karn Tiwari, Prathosh A. P., Sri Harsha Kota, N. M. Anoop Krishnan

    Abstract: Carbon Monoxide (CO) is a dominant pollutant in urban areas due to the energy generation from fossil fuels for industry, automobile, and domestic requirements. Forecasting the evolution of CO in real-time can enable the deployment of effective early warning systems and intervention strategies. However, the computational cost associated with the physics and chemistry-based simulation makes it prohi… ▽ More

    Submitted 13 January, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

    Comments: 28 pages, 14 figures, under submission process

  13. arXiv:2501.03486  [pdf, other

    cs.LG cs.AI

    Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment

    Authors: Prashant Trivedi, Souradip Chakraborty, Avinash Reddy, Vaneet Aggarwal, Amrit Singh Bedi, George K. Atia

    Abstract: The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, such as reinforcement learning from human feedback (RLHF), achieve alignment by fine-tuning model parameters, but these approaches are often computationally expensive and impractical when models are froz… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 27 pages, Accepted in AAAI 2025

  14. arXiv:2501.00031  [pdf, other

    cs.CL

    Distilling Large Language Models for Efficient Clinical Information Extraction

    Authors: Karthik S. Vedula, Annika Gupta, Akshay Swaminathan, Ivan Lopez, Suhana Bedi, Nigam H. Shah

    Abstract: Large language models (LLMs) excel at clinical information extraction but their computational demands limit practical deployment. Knowledge distillation--the process of transferring knowledge from larger to smaller models--offers a potential solution. We evaluate the performance of distilled BERT models, which are approximately 1,000 times smaller than modern LLMs, for clinical named entity recogn… ▽ More

    Submitted 20 December, 2024; originally announced January 2025.

    Comments: 19 pages, 1 figure, 10 tables

    MSC Class: 68T50 ACM Class: I.2.7

  15. arXiv:2412.16178  [pdf, other

    cs.LG cs.AI cs.CE

    Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs

    Authors: Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Re, Sanmi Koyejo, Nigam H. Shah

    Abstract: Foundation Models (FMs) trained on Electronic Health Records (EHRs) have achieved state-of-the-art results on numerous clinical prediction tasks. However, most existing EHR FMs have context windows of <1k tokens. This prevents them from modeling full patient EHRs which can exceed 10k's of events. Recent advancements in subquadratic long-context architectures (e.g., Mamba) offer a promising solutio… ▽ More

    Submitted 18 March, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

  16. arXiv:2412.12896  [pdf, other

    astro-ph.IM

    3D Free-Form Optical Lens -- Miniaturised Fibre Couplers for Astrophotonics

    Authors: Haoran Mu, Daniel Smith, Tomas Katkus, Nguyen Hoai An Le, Dominyka Stonyte, Darius Gailevicius, Dan Kapsaskis, Alexander Del Frate, Talwinder Singh Bedi, Donatas Narbutis, Vijayakumar Anand, Darija Astrauskyte, Lina Grineviciute, Soon Hock Ng, Karl Glazebrook, Jon Lawrence, Saulius Juodkazis

    Abstract: In astronomy, multi-object spectrographs employ fibre positioning robots to couple the light from multiple astronomy sources (stars or galaxies) into multiple multi-mode fibres, which are distributed across the focal plane of the telescope. These fibres transport the celestial light to the entrance slit of a spectrograph (or bank of spectrographs) for analysis. For any multi-object system mm-scale… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Comments: 18 pages 11 figures (main text)

  17. arXiv:2412.05232  [pdf, other

    cs.CL

    LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds

    Authors: James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah

    Abstract: Traditional jailbreaks have successfully exposed vulnerabilities in LLMs, primarily relying on discrete combinatorial optimization, while more recent methods focus on training LLMs to generate adversarial prompts. However, both approaches are computationally expensive and slow, often requiring significant resources to generate a single successful attack. We hypothesize that the inefficiency of the… ▽ More

    Submitted 10 February, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  18. arXiv:2411.18688  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Alvaro Velasquez, Ahmad Beirami, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: With the widespread deployment of Multimodal Large Language Models (MLLMs) for visual-reasoning tasks, improving their safety has become crucial. Recent research indicates that despite training-time safety alignment, these models remain vulnerable to jailbreak attacks. In this work, we first highlight an important safety gap to describe that alignment achieved solely through safety training may be… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to CVPR 2025

  19. arXiv:2411.00361  [pdf, other

    cs.LG

    Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Anit Kumar Sahu, Mubarak Shah, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) that addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. HPO leverages maximum entropy reinforcement learning combined with token-level Direct Preference Optimization (DPO), eliminating the need for pre-trained re… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  20. arXiv:2410.20263  [pdf, other

    cs.RO cs.AI cs.CV

    EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering

    Authors: Kai Cheng, Zhengyuan Li, Xingpeng Sun, Byung-Cheol Min, Amrit Singh Bedi, Aniket Bera

    Abstract: Embodied Question Answering (EQA) is an essential yet challenging task for robotic home assistants. Recent studies have shown that large vision-language models (VLMs) can be effectively utilized for EQA, but existing works either focus on video-based question answering without embodied exploration or rely on closed-form choice sets. In real-world scenarios, a robotic agent must efficiently explore… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

  21. arXiv:2410.18194  [pdf, other

    cs.LG cs.AI cs.CL

    ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment

    Authors: Elyas Obbad, Iddah Mlauzi, Brando Miranda, Rylan Schaeffer, Kamal Obbad, Suhana Bedi, Sanmi Koyejo

    Abstract: Data selection is crucial for optimizing language model (LM) performance on specific tasks, yet most existing methods fail to effectively consider the target task distribution. Current approaches either ignore task-specific requirements entirely or rely on approximations that fail to capture the nuanced patterns needed for tasks like Autoformalization or code generation. Methods that do consid… ▽ More

    Submitted 12 April, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

  22. arXiv:2410.15610   

    cs.LG

    On The Global Convergence Of Online RLHF With Neural Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, Vaneet Aggarwal

    Abstract: The importance of Reinforcement Learning from Human Feedback (RLHF) in aligning large language models (LLMs) with human values cannot be overstated. RLHF is a three-stage process that includes supervised fine-tuning (SFT), reward learning, and policy learning. Although there are several offline and online approaches to aligning LLMs, they often suffer from distribution shift issues. These issues a… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: The updated version of this paper is arXiv:2503.17644

  23. arXiv:2410.04108  [pdf, other

    cs.LG cs.AI

    Towards Scalable General Utility Reinforcement Learning: Occupancy Approximation, Sample Complexity and Global Optimality

    Authors: Anas Barakat, Souradip Chakraborty, Peihong Yu, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Reinforcement learning with general utilities has recently gained attention thanks to its ability to unify several problems, including imitation learning, pure exploration, and safe reinforcement learning. However, prior work for solving this general problem in a unified way has only focused on the tabular setting. This is restrictive when considering larger state-action spaces because of the need… ▽ More

    Submitted 26 February, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

    Comments: revised version

  24. arXiv:2410.03131  [pdf, other

    cs.AI cs.CL cs.LG

    AIME: AI System Optimization via Multiple LLM Evaluators

    Authors: Bhrij Patel, Souradip Chakraborty, Wesley A. Suttle, Mengdi Wang, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Text-based AI system optimization typically involves a feedback loop scheme where a single LLM generates an evaluation in natural language of the current output to improve the next iteration's output. However, in this work, we empirically demonstrate that for a practical and complex task (code generation) with multiple criteria to evaluate, utilizing only one LLM evaluator tends to let errors in g… ▽ More

    Submitted 28 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: 21 pages, 10 Figures, 4 Tables

  25. arXiv:2410.01871  [pdf, other

    cs.GT cs.AI cs.CY econ.GN

    Auction-Based Regulation for Artificial Intelligence

    Authors: Marco Bornstein, Zora Che, Suhas Julapalli, Abdirisak Mohamed, Amrit Singh Bedi, Furong Huang

    Abstract: In an era of "moving fast and breaking things", regulators have moved slowly to pick up the safety, bias, and legal debris left in the wake of broken Artificial Intelligence (AI) deployment. While there is much-warranted discussion about how to address the safety, bias, and legal woes of state-of-the-art AI models, rigorous and realistic mathematical frameworks to regulate AI are lacking. Our pape… ▽ More

    Submitted 3 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 22 pages, 8 figures, 2 tables

  26. arXiv:2409.09095  [pdf, other

    cs.LG cs.DB

    meds_reader: A fast and efficient EHR processing library

    Authors: Ethan Steinberg, Michael Wornow, Suhana Bedi, Jason Alan Fries, Matthew B. A. McDermott, Nigam H. Shah

    Abstract: The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable. In this paper, we introduce meds_reader, an optimized Python package for efficient EHR data processing that is designed to take advantage of many intrinsic properties of EHR data for improved spee… ▽ More

    Submitted 14 November, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 8 pages

  27. arXiv:2408.08812  [pdf, other

    cs.LG

    CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk

    Authors: Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu

    Abstract: Transfer learning in reinforcement learning (RL) has become a pivotal strategy for improving data efficiency in new, unseen tasks by utilizing knowledge from previously learned tasks. This approach is especially beneficial in real-world deployment scenarios where computational resources are constrained and agents must adapt rapidly to novel environments. However, current state-of-the-art methods o… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  28. arXiv:2408.01867  [pdf, other

    cs.RO

    TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation

    Authors: Xingpeng Sun, Yiran Zhang, Xindi Tang, Amrit Singh Bedi, Aniket Bera

    Abstract: While LLMs are proficient at processing text in human conversations, they often encounter difficulties with the nuances of verbal instructions and, thus, remain prone to hallucinate trust in human command. In this work, we present TrustNavGPT, an LLM based audio guided navigation agent that uses affective cues in spoken communication elements such as tone and inflection that convey meaning beyond… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Journal ref: IROS 2024

  29. arXiv:2406.10918  [pdf, other

    cs.LG cs.AI cs.CL

    Multi-LLM QA with Embodied Exploration

    Authors: Bhrij Patel, Vishnu Sashank Dorbala, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Large language models (LLMs) have grown in popularity due to their natural language interface and pre trained knowledge, leading to rapidly increasing success in question-answering (QA) tasks. More recently, multi-agent systems with LLM-based agents (Multi-LLM) have been utilized increasingly more for QA. In these scenarios, the models may each answer the question and reach a consensus or each mod… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 Figures, 5 Tables

  30. arXiv:2406.10892   

    cs.LG

    DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

    Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More

    Submitted 30 December, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: This paper is subsumed by a later paper of ours: arXiv:2411.00361

  31. arXiv:2405.20495  [pdf, other

    cs.CL cs.LG

    Transfer Q Star: Principled Decoding for LLM Alignment

    Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

    Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  32. arXiv:2405.13879  [pdf, other

    cs.GT cs.DC cs.LG econ.TH

    FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

    Authors: Marco Bornstein, Amrit Singh Bedi, Abdirisak Mohamed, Furong Huang

    Abstract: Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way ou… ▽ More

    Submitted 24 February, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024, 20 pages, 7 figures

  33. arXiv:2405.01843  [pdf, ps, other

    cs.LG cs.AI

    Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \text… ▽ More

    Submitted 9 December, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024. This is a revised version of arXiv:2306.10486, where we have gone from finite action space to continuous action space, from average iterate convergence to last iterate convergence and from $ε^{-4}$ to $ε^{-3}$ sample complexity. This version fixes the related work result of (Xu et al., 2020a), based on their result update on arXiv

  34. arXiv:2404.13423  [pdf, other

    cs.LG

    PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  35. arXiv:2403.11925  [pdf, other

    cs.LG

    Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

    Authors: Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 26 Pages, 2 Figures

  36. arXiv:2403.09905  [pdf, other

    cs.RO cs.CV

    Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation

    Authors: Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Embodied Navigation tasks often involve constructing topological graphs of a scene during exploration to facilitate high-level planning and decision-making for execution in continuous environments. Prior literature makes the assumption of static graphs with stationary targets, which does not hold in many real-world environments with moving objects. To address this, we present a novel formulation g… ▽ More

    Submitted 10 March, 2025; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 18

  37. arXiv:2402.10340  [pdf, other

    cs.RO cs.AI

    On the Vulnerability of LLM/VLM-Controlled Robotics

    Authors: Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, Jing Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

    Abstract: In this work, we highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities. While LLM/VLM-controlled robots show impressive performance across various tasks, their reliability under slight input variations remains underexplored yet critical. These models are highly sensitive to instruction or perceptu… ▽ More

    Submitted 6 March, 2025; v1 submitted 15 February, 2024; originally announced February 2024.

  38. arXiv:2402.08925  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    MaxMin-RLHF: Alignment with Diverse Human Preferences

    Authors: Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting it… ▽ More

    Submitted 25 December, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  39. arXiv:2402.03494  [pdf, other

    cs.AI cs.RO

    Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

    Authors: Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi, Aniket Bera

    Abstract: While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the as… ▽ More

    Submitted 10 November, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 30 pages, 7 figures

    Journal ref: Transactions on Machine Learning Research 2024

  40. arXiv:2312.14436  [pdf, other

    cs.RO cs.LG

    REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi

    Abstract: The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward function and underlying human preferences (values, social norms) can lead to catastrophic outcomes in the real world especially in the context of robotics for crit… ▽ More

    Submitted 19 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

  41. arXiv:2310.15264  [pdf, other

    cs.CL cs.AI

    Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contami… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  42. arXiv:2310.13681  [pdf, other

    cs.GT cs.CY cs.DC cs.LG econ.TH

    Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

    Authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang

    Abstract: Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 24 pages, 11 figures

  43. arXiv:2310.10258  [pdf, other

    math.CV

    Minimal surfaces over harmonic shears

    Authors: Simran Bedi, Sanjay Kumar

    Abstract: Harmonic mappings have long intrigued researchers due to their intrinsic connection with minimal surfaces. In this paper, we investigate shearing of two distinct classes of univalent conformal mappings which are convex in horizontal direction with appropriate dilatations. Subsequently, we present a family of minimal surfaces constructed by lifting the harmonic mappings obtained through shear const… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 6 figures

  44. arXiv:2310.00481  [pdf, other

    cs.RO

    LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

    Authors: Chak Lam Shek, Xiyang Wu, Wesley A. Suttle, Carl Busart, Erin Zaroukian, Dinesh Manocha, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of context information, a task complicated by the inhe… ▽ More

    Submitted 7 October, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  45. arXiv:2308.02585  [pdf, other

    cs.LG

    PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

    Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment obj… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  46. arXiv:2306.10486  [pdf, ps, other

    cs.LG

    On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.07675

    ACM Class: F.2.1

  47. arXiv:2306.06236  [pdf, other

    cs.MA cs.LG cs.RO

    iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

    Authors: Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  48. arXiv:2306.06192  [pdf, other

    cs.RO cs.AI cs.LG

    Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation

    Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error. However, real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies due to sample inefficiency of RL. In this work, we introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficienc… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 10 pages, 6 figures, 2 tables

  49. arXiv:2304.14710  [pdf

    cs.CV cs.AI

    Image-based Indian Sign Language Recognition: A Practical Review using Deep Neural Networks

    Authors: Mallikharjuna Rao K, Harleen Kaur, Sanjam Kaur Bedi, M A Lekhana

    Abstract: People with vocal and hearing disabilities use sign language to express themselves using visual gestures and signs. Although sign language is a solution for communication difficulties faced by deaf people, there are still problems as most of the general population cannot understand this language, creating a communication barrier, especially in places such as banks, airports, supermarkets, etc. [1]… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

    Comments: 14 pages, 20 figures, 1 table

  50. arXiv:2304.04736  [pdf, other

    cs.CL cs.AI cs.LG

    On the Possibilities of AI-Generated Text Detection

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

    Abstract: Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire suppo… ▽ More

    Submitted 2 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.