Skip to main content

Showing 1–10 of 10 results for author: Tarbouriech, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.13294  [pdf, other

    cs.LG cs.AI

    Probabilistic Inference in Reinforcement Learning Done Right

    Authors: Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue

    Abstract: A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statist… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  2. arXiv:2111.12045  [pdf, other

    cs.LG

    Adaptive Multi-Goal Exploration

    Authors: Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

    Abstract: We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching states to adaptively target goals that are neither too difficult nor too easy. We show how AdaGoal can be used to tackle the objective of learning an $ε$-optimal goal-conditioned policy for the (initially unknown) set… ▽ More

    Submitted 24 February, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: AISTATS 2022

  3. arXiv:2110.14457  [pdf, other

    cs.LG

    Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching

    Authors: Pierre-Alexandre Kamienny, Jean Tarbouriech, Sylvain Lamprier, Alessandro Lazaric, Ludovic Denoyer

    Abstract: Learning meaningful behaviors in the absence of reward is a difficult problem in reinforcement learning. A desirable and challenging unsupervised objective is to learn a set of diverse skills that provide a thorough coverage of the state space while being directed, i.e., reliably reaching distinct regions of the environment. In this paper, we build on the mutual information framework for skill dis… ▽ More

    Submitted 30 April, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: ICLR 2022

  4. arXiv:2104.11186  [pdf, other

    cs.LG

    Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

    Authors: Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

    Abstract: We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to induce an optimistic SSP problem whose associated value iteration schem… ▽ More

    Submitted 10 December, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: NeurIPS 2021

  5. arXiv:2012.14755  [pdf, other

    cs.LG stat.ML

    Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

    Authors: Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

    Abstract: We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of $ε$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps (in expectation) from a reference state $s_0$. In this paper, we intro… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

    Comments: NeurIPS 2020

  6. arXiv:2007.06437  [pdf, other

    cs.LG stat.ML

    A Provably Efficient Sample Collection Strategy for Reinforcement Learning

    Authors: Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

    Abstract: One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample complexity, state-space coverage or model estimation, we need to strike a different exploration-exploitation trade-off. In this paper, we propose to tackle the explora… ▽ More

    Submitted 18 November, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: NeurIPS 2021

  7. arXiv:2003.03297  [pdf, other

    stat.ML cs.LG

    Active Model Estimation in Markov Decision Processes

    Authors: Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

    Abstract: We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP). Efficient exploration in this problem requires the agent to identify the regions in which estimating the model is more difficult and then exploit this knowledge to collect more samples there. In this paper, we formalize this problem, introduce the first a… ▽ More

    Submitted 22 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

  8. arXiv:2002.03839  [pdf, other

    cs.LG stat.ML

    Adversarial Attacks on Linear Contextual Bandits

    Authors: Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta

    Abstract: Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a… ▽ More

    Submitted 23 October, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  9. arXiv:1912.03517  [pdf, other

    stat.ML cs.LG

    No-Regret Exploration in Goal-Oriented Reinforcement Learning

    Authors: Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

    Abstract: Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost. Despite the popularity of this setting, the exploration-exploitation dilemma has been sparsely studied in general SSP pro… ▽ More

    Submitted 17 August, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

    Journal ref: International Conference on Machine Learning (ICML 2020)

  10. arXiv:1902.11199  [pdf, other

    stat.ML cs.LG

    Active Exploration in Markov Decision Processes

    Authors: Jean Tarbouriech, Alessandro Lazaric

    Abstract: We introduce the active exploration problem in Markov decision processes (MDPs). Each state of the MDP is characterized by a random value and the learner should gather samples to estimate the mean value of each state as accurately as possible. Similarly to active exploration in multi-armed bandit (MAB), states may have different levels of noise, so that the higher the noise, the more samples are n… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.