Skip to main content

Showing 1–7 of 7 results for author: Satija, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.05396  [pdf, ps, other

    cs.CL cs.AI cs.MA

    Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate

    Authors: Andrea Wynn, Harsh Satija, Gillian Hadfield

    Abstract: While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusively focused on debates within homogeneous groups of agents, whereas we explore how diversity in model capabilities influences the dynamics and outcomes of multi-agent interactions. Through a series of exp… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

    Comments: ICML MAS Workshop 2025

  2. arXiv:2109.00157  [pdf, other

    cs.LG cs.AI

    A Survey of Exploration Methods in Reinforcement Learning

    Authors: Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

    Abstract: Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern… ▽ More

    Submitted 2 September, 2021; v1 submitted 31 August, 2021; originally announced September 2021.

  3. arXiv:2106.00099  [pdf, other

    cs.LG

    Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

    Authors: Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

    Abstract: We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the… ▽ More

    Submitted 29 October, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  4. arXiv:2012.13658  [pdf, other

    cs.LG

    Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

    Authors: Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi, Harsh Satija, Doina Precup

    Abstract: A major challenge in reinforcement learning is the design of exploration strategies, especially for environments with sparse reward structures and continuous state and action spaces. Intuitively, if the reinforcement signal is very scarce, the agent should rely on some form of short-term memory in order to cover its environment efficiently. We propose a new exploration method, based on two intuiti… ▽ More

    Submitted 11 June, 2021; v1 submitted 25 December, 2020; originally announced December 2020.

    Comments: To be published in ICML, 2021

  5. arXiv:2008.11811  [pdf, other

    cs.LG math.OC stat.ML

    Constrained Markov Decision Processes via Backward Value Functions

    Authors: Harsh Satija, Philip Amortila, Joelle Pineau

    Abstract: Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety or resources). In standard RL, the agent is incentivized to explore any behavior as long as it maximizes rewards, but in the real world, undesired behavior can d… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

  6. arXiv:1806.02315  [pdf, other

    cs.LG stat.ML

    Randomized Value Functions via Multiplicative Normalizing Flows

    Authors: Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

    Abstract: Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling… ▽ More

    Submitted 28 June, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Journal ref: UAI 2019: Conference on Uncertainty in Artificial Intelligence 2019

  7. arXiv:1804.10689  [pdf, other

    cs.LG cs.AI stat.ML

    Decoupling Dynamics and Reward for Transfer Learning

    Authors: Amy Zhang, Harsh Satija, Joelle Pineau

    Abstract: Current reinforcement learning (RL) methods can successfully learn single tasks but often generalize poorly to modest perturbations in task domain or training procedure. In this work, we present a decoupled learning strategy for RL that creates a shared representation space where knowledge can be robustly transferred. We separate learning the task representation, the forward dynamics, the inverse… ▽ More

    Submitted 8 May, 2018; v1 submitted 27 April, 2018; originally announced April 2018.