Skip to main content

Showing 1–50 of 81 results for author: Bedi, A S

.
  1. arXiv:2505.23729  [pdf, ps, other

    cs.CL cs.AI

    Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

    Authors: Mohamad Chehade, Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Dinesh Manocha, Hao Zhu, Amrit Singh Bedi

    Abstract: Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies-optimizing primary objectives w… ▽ More

    Submitted 31 May, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML 2025

  2. arXiv:2505.18344  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Sample Complexity of Diffusion Model Training Without Empirical Risk Minimizer Access

    Authors: Mudit Gaur, Prashant Trivedi, Sasidhar Kunapuli, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimatio… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  3. arXiv:2504.01931  [pdf, other

    cs.CL

    Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

    Authors: Souradip Chakraborty, Mohammadreza Pourreza, Ruoxi Sun, Yiwen Song, Nino Scherrer, Furong Huang, Amrit Singh Bedi, Ahmad Beirami, Jindong Gu, Hamid Palangi, Tomas Pfister

    Abstract: While AI agents have shown remarkable performance at various tasks, they still struggle with complex multi-modal applications, structured generation and strategic planning. Improvements via standard fine-tuning is often impractical, as solving agentic tasks usually relies on black box API access without control over model parameters. Inference-time methods such as Best-of-N (BON) sampling offer a… ▽ More

    Submitted 5 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  4. arXiv:2503.18816  [pdf, other

    cs.RO cs.AI

    Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm

    Authors: Chak Lam Shek, Amrit Singh Bedi, Anjon Basak, Ellen Novoseller, Nick Waytowich, Priya Narayanan, Dinesh Manocha, Pratap Tokekar

    Abstract: In this work, we present a novel cooperative multi-agent reinforcement learning method called \textbf{Loc}ality based \textbf{Fac}torized \textbf{M}ulti-Agent \textbf{A}ctor-\textbf{C}ritic (Loc-FACMAC). Existing state-of-the-art algorithms, such as FACMAC, rely on global reward information, which may not accurately reflect the quality of individual robots' actions in decentralized systems. We int… ▽ More

    Submitted 28 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  5. arXiv:2503.17644  [pdf, other

    cs.LG cs.AI

    On The Sample Complexity Bounds In Bilevel Reinforcement Learning

    Authors: Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathu, Vaneet Aggarwal

    Abstract: Bilevel reinforcement learning (BRL) has emerged as a powerful framework for aligning generative models, yet its theoretical foundations, especially sample complexity bounds, remain underexplored. In this work, we present the first sample complexity bound for BRL, establishing a rate of $\mathcal{O}(ε^{-3})$ in continuous state-action spaces. Traditional MDP analysis techniques do not extend to BR… ▽ More

    Submitted 23 May, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: This is updated version of the paper 2410.15610

  6. arXiv:2503.12575  [pdf, other

    cs.CV cs.AI

    BalancedDPO: Adaptive Multi-Metric Alignment

    Authors: Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO)… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  7. arXiv:2501.03486  [pdf, other

    cs.LG cs.AI

    Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment

    Authors: Prashant Trivedi, Souradip Chakraborty, Avinash Reddy, Vaneet Aggarwal, Amrit Singh Bedi, George K. Atia

    Abstract: The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, such as reinforcement learning from human feedback (RLHF), achieve alignment by fine-tuning model parameters, but these approaches are often computationally expensive and impractical when models are froz… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 27 pages, Accepted in AAAI 2025

  8. arXiv:2412.05232  [pdf, other

    cs.CL

    LIAR: Leveraging Inference Time Alignment (Best-of-N) to Jailbreak LLMs in Seconds

    Authors: James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah

    Abstract: Traditional jailbreaks have successfully exposed vulnerabilities in LLMs, primarily relying on discrete combinatorial optimization, while more recent methods focus on training LLMs to generate adversarial prompts. However, both approaches are computationally expensive and slow, often requiring significant resources to generate a single successful attack. We hypothesize that the inefficiency of the… ▽ More

    Submitted 10 February, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  9. arXiv:2411.18688  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Alvaro Velasquez, Ahmad Beirami, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: With the widespread deployment of Multimodal Large Language Models (MLLMs) for visual-reasoning tasks, improving their safety has become crucial. Recent research indicates that despite training-time safety alignment, these models remain vulnerable to jailbreak attacks. In this work, we first highlight an important safety gap to describe that alignment achieved solely through safety training may be… ▽ More

    Submitted 31 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to CVPR 2025

  10. arXiv:2411.00361  [pdf, other

    cs.LG

    Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Anit Kumar Sahu, Mubarak Shah, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) that addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. HPO leverages maximum entropy reinforcement learning combined with token-level Direct Preference Optimization (DPO), eliminating the need for pre-trained re… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  11. arXiv:2410.20263  [pdf, other

    cs.RO cs.AI cs.CV

    EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering

    Authors: Kai Cheng, Zhengyuan Li, Xingpeng Sun, Byung-Cheol Min, Amrit Singh Bedi, Aniket Bera

    Abstract: Embodied Question Answering (EQA) is an essential yet challenging task for robotic home assistants. Recent studies have shown that large vision-language models (VLMs) can be effectively utilized for EQA, but existing works either focus on video-based question answering without embodied exploration or rely on closed-form choice sets. In real-world scenarios, a robotic agent must efficiently explore… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

  12. arXiv:2410.15610   

    cs.LG

    On The Global Convergence Of Online RLHF With Neural Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, Vaneet Aggarwal

    Abstract: The importance of Reinforcement Learning from Human Feedback (RLHF) in aligning large language models (LLMs) with human values cannot be overstated. RLHF is a three-stage process that includes supervised fine-tuning (SFT), reward learning, and policy learning. Although there are several offline and online approaches to aligning LLMs, they often suffer from distribution shift issues. These issues a… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: The updated version of this paper is arXiv:2503.17644

  13. arXiv:2410.04108  [pdf, other

    cs.LG cs.AI

    Towards Scalable General Utility Reinforcement Learning: Occupancy Approximation, Sample Complexity and Global Optimality

    Authors: Anas Barakat, Souradip Chakraborty, Peihong Yu, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Reinforcement learning with general utilities has recently gained attention thanks to its ability to unify several problems, including imitation learning, pure exploration, and safe reinforcement learning. However, prior work for solving this general problem in a unified way has only focused on the tabular setting. This is restrictive when considering larger state-action spaces because of the need… ▽ More

    Submitted 26 February, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

    Comments: revised version

  14. arXiv:2410.03131  [pdf, other

    cs.AI cs.CL cs.LG

    AIME: AI System Optimization via Multiple LLM Evaluators

    Authors: Bhrij Patel, Souradip Chakraborty, Wesley A. Suttle, Mengdi Wang, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Text-based AI system optimization typically involves a feedback loop scheme where a single LLM generates an evaluation in natural language of the current output to improve the next iteration's output. However, in this work, we empirically demonstrate that for a practical and complex task (code generation) with multiple criteria to evaluate, utilizing only one LLM evaluator tends to let errors in g… ▽ More

    Submitted 28 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: 21 pages, 10 Figures, 4 Tables

  15. arXiv:2410.01871  [pdf, other

    cs.GT cs.AI cs.CY econ.GN

    Auction-Based Regulation for Artificial Intelligence

    Authors: Marco Bornstein, Zora Che, Suhas Julapalli, Abdirisak Mohamed, Amrit Singh Bedi, Furong Huang

    Abstract: In an era of "moving fast and breaking things", regulators have moved slowly to pick up the safety, bias, and legal debris left in the wake of broken Artificial Intelligence (AI) deployment. While there is much-warranted discussion about how to address the safety, bias, and legal woes of state-of-the-art AI models, rigorous and realistic mathematical frameworks to regulate AI are lacking. Our pape… ▽ More

    Submitted 3 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 22 pages, 8 figures, 2 tables

  16. arXiv:2408.08812  [pdf, other

    cs.LG

    CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk

    Authors: Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu

    Abstract: Transfer learning in reinforcement learning (RL) has become a pivotal strategy for improving data efficiency in new, unseen tasks by utilizing knowledge from previously learned tasks. This approach is especially beneficial in real-world deployment scenarios where computational resources are constrained and agents must adapt rapidly to novel environments. However, current state-of-the-art methods o… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  17. arXiv:2408.01867  [pdf, other

    cs.RO

    TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation

    Authors: Xingpeng Sun, Yiran Zhang, Xindi Tang, Amrit Singh Bedi, Aniket Bera

    Abstract: While LLMs are proficient at processing text in human conversations, they often encounter difficulties with the nuances of verbal instructions and, thus, remain prone to hallucinate trust in human command. In this work, we present TrustNavGPT, an LLM based audio guided navigation agent that uses affective cues in spoken communication elements such as tone and inflection that convey meaning beyond… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Journal ref: IROS 2024

  18. arXiv:2406.10918  [pdf, other

    cs.LG cs.AI cs.CL

    Multi-LLM QA with Embodied Exploration

    Authors: Bhrij Patel, Vishnu Sashank Dorbala, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Large language models (LLMs) have grown in popularity due to their natural language interface and pre trained knowledge, leading to rapidly increasing success in question-answering (QA) tasks. More recently, multi-agent systems with LLM-based agents (Multi-LLM) have been utilized increasingly more for QA. In these scenarios, the models may each answer the question and reach a consensus or each mod… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 Figures, 5 Tables

  19. arXiv:2406.10892   

    cs.LG

    DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

    Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More

    Submitted 30 December, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: This paper is subsumed by a later paper of ours: arXiv:2411.00361

  20. arXiv:2405.20495  [pdf, other

    cs.CL cs.LG

    Transfer Q Star: Principled Decoding for LLM Alignment

    Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

    Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  21. arXiv:2405.13879  [pdf, other

    cs.GT cs.DC cs.LG econ.TH

    FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

    Authors: Marco Bornstein, Amrit Singh Bedi, Abdirisak Mohamed, Furong Huang

    Abstract: Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way ou… ▽ More

    Submitted 24 February, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024, 20 pages, 7 figures

  22. arXiv:2405.01843  [pdf, ps, other

    cs.LG cs.AI

    Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \text… ▽ More

    Submitted 9 December, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024. This is a revised version of arXiv:2306.10486, where we have gone from finite action space to continuous action space, from average iterate convergence to last iterate convergence and from $ε^{-4}$ to $ε^{-3}$ sample complexity. This version fixes the related work result of (Xu et al., 2020a), based on their result update on arXiv

  23. arXiv:2404.13423  [pdf, other

    cs.LG

    PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  24. arXiv:2403.11925  [pdf, other

    cs.LG

    Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

    Authors: Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 26 Pages, 2 Figures

  25. arXiv:2403.09905  [pdf, other

    cs.RO cs.CV

    Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation

    Authors: Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Embodied Navigation tasks often involve constructing topological graphs of a scene during exploration to facilitate high-level planning and decision-making for execution in continuous environments. Prior literature makes the assumption of static graphs with stationary targets, which does not hold in many real-world environments with moving objects. To address this, we present a novel formulation g… ▽ More

    Submitted 10 March, 2025; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 18

  26. arXiv:2402.10340  [pdf, other

    cs.RO cs.AI

    On the Vulnerability of LLM/VLM-Controlled Robotics

    Authors: Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, Jing Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

    Abstract: In this work, we highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities. While LLM/VLM-controlled robots show impressive performance across various tasks, their reliability under slight input variations remains underexplored yet critical. These models are highly sensitive to instruction or perceptu… ▽ More

    Submitted 6 March, 2025; v1 submitted 15 February, 2024; originally announced February 2024.

  27. arXiv:2402.08925  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    MaxMin-RLHF: Alignment with Diverse Human Preferences

    Authors: Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting it… ▽ More

    Submitted 25 December, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  28. arXiv:2402.03494  [pdf, other

    cs.AI cs.RO

    Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

    Authors: Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi, Aniket Bera

    Abstract: While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the as… ▽ More

    Submitted 10 November, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 30 pages, 7 figures

    Journal ref: Transactions on Machine Learning Research 2024

  29. arXiv:2312.14436  [pdf, other

    cs.RO cs.LG

    REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi

    Abstract: The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward function and underlying human preferences (values, social norms) can lead to catastrophic outcomes in the real world especially in the context of robotics for crit… ▽ More

    Submitted 19 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

  30. arXiv:2310.15264  [pdf, other

    cs.CL cs.AI

    Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contami… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  31. arXiv:2310.13681  [pdf, other

    cs.GT cs.CY cs.DC cs.LG econ.TH

    Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

    Authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang

    Abstract: Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 24 pages, 11 figures

  32. arXiv:2310.00481  [pdf, other

    cs.RO

    LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

    Authors: Chak Lam Shek, Xiyang Wu, Wesley A. Suttle, Carl Busart, Erin Zaroukian, Dinesh Manocha, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of context information, a task complicated by the inhe… ▽ More

    Submitted 7 October, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  33. arXiv:2308.02585  [pdf, other

    cs.LG

    PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

    Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment obj… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  34. arXiv:2306.10486  [pdf, ps, other

    cs.LG

    On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.07675

    ACM Class: F.2.1

  35. arXiv:2306.06236  [pdf, other

    cs.MA cs.LG cs.RO

    iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

    Authors: Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  36. arXiv:2306.06192  [pdf, other

    cs.RO cs.AI cs.LG

    Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation

    Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Reinforcement learning (RL) is a promising approach for robotic navigation, allowing robots to learn through trial and error. However, real-world robotic tasks often suffer from sparse rewards, leading to inefficient exploration and suboptimal policies due to sample inefficiency of RL. In this work, we introduce Confidence-Controlled Exploration (CCE), a novel method that improves sample efficienc… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 10 pages, 6 figures, 2 tables

  37. arXiv:2304.04736  [pdf, other

    cs.CL cs.AI cs.LG

    On the Possibilities of AI-Generated Text Detection

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

    Abstract: Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire suppo… ▽ More

    Submitted 2 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  38. arXiv:2303.07622  [pdf, other

    cs.RO cs.AI cs.LG

    RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

    Authors: Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (REquest help and MOVE on) to adapt already trained policy to real-time changes in the environment without re-training vi… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  39. arXiv:2301.12083  [pdf, other

    cs.LG math.OC stat.ML

    Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

    Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

    Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak… ▽ More

    Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  40. arXiv:2301.12038  [pdf, other

    cs.LG cs.AI stat.ML

    STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha

    Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instanc… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  41. arXiv:2210.14026  [pdf, other

    cs.DC cs.LG math.OC

    SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

    Authors: Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang

    Abstract: The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this wo… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 30 pages, 9 figures

  42. arXiv:2209.06415  [pdf, other

    cs.RO

    DMCA: Dense Multi-agent Navigation using Attention and Communication

    Authors: Senthil Hariharan Arul, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In decentralized multi-robot navigation, ensuring safe and efficient movement with limited environmental awareness remains a challenge. While robots traditionally navigate based on local observations, this approach falters in complex environments. A possible solution is to enhance understanding of the world through inter-agent communication, but mere information broadcasting falls short in efficie… ▽ More

    Submitted 25 June, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

  43. arXiv:2209.05738  [pdf, other

    cs.RO cs.MA

    RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

    Authors: Aakriti Agrawal, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots… ▽ More

    Submitted 27 February, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Journal ref: ICRA 2023

  44. arXiv:2209.02865  [pdf, other

    cs.RO cs.LG cs.MA

    DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments

    Authors: Aakriti Agrawal, Senthil Hariharan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Journal ref: IROS-2022

  45. arXiv:2207.03694  [pdf, other

    cs.RO

    HTRON:Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm

    Authors: Kasun Weerakoon, Souradip Chakraborty, Nare Karapetyan, Adarsh Jagan Sathyamoorthy, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel approach to improve the performance of deep reinforcement learning (DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on carefully designed dense reward functions that learn the efficient behavior in an environment. We circumvent this issue by working only with sparse rewards (which are easy to design), and propose a novel adaptive Heavy-Tailed Re… ▽ More

    Submitted 10 October, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

  46. arXiv:2206.10815  [pdf, other

    cs.LG cs.DC math.OC

    FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

    Authors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

    Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin… ▽ More

    Submitted 1 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  47. arXiv:2206.08829  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning

    Authors: Anis Elgabli, Chaouki Ben Issaid, Amrit S. Bedi, Ketan Rajawat, Mehdi Bennis, Vaneet Aggarwal

    Abstract: Newton-type methods are popular in federated learning due to their fast convergence. Still, they suffer from two main issues, namely: low communication efficiency and low privacy due to the requirement of sending Hessian information from clients to parameter server (PS). In this work, we introduced a novel framework called FedNew in which there is no need to transmit Hessian information from clien… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  48. arXiv:2206.05850  [pdf, other

    cs.LG cs.AI eess.SY

    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

    Authors: Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value fu… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023

  49. arXiv:2206.05652  [pdf, other

    cs.LG cs.RO eess.SY

    Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

    Abstract: In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either rewa… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

  50. arXiv:2206.01162  [pdf, other

    cs.LG math.OC stat.ML

    Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

    Abstract: Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assump… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 June, 2022; originally announced June 2022.