Skip to main content

Showing 1–7 of 7 results for author: Domingues, O D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2103.01312  [pdf, other

    stat.ML cs.LG

    UCB Momentum Q-learning: Correcting the bias without forgetting

    Authors: Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

    Abstract: We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process. UCBMQ is based on Q-learning where we add a momentum term and rely on the principle of optimism in face of uncertainty to deal with exploration. Our new technical ingredient of UCBMQ is the use of momentum to correct the… ▽ More

    Submitted 18 March, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

  2. arXiv:2010.03531  [pdf, ps, other

    cs.LG stat.ML

    Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

    Authors: Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

    Abstract: In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a novel lower bound of $Ω((H^3SA/ε^2)\log(1/δ))$ on the sample complexity of an $(\varepsilon,δ)$-PAC algorithm for best poli… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

  3. arXiv:2007.13442  [pdf, other

    cs.LG stat.ML

    Fast active learning for pure exploration in reinforcement learning

    Authors: Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko

    Abstract: Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring efficiently. The exploration remains a challenge while it has been addressed with many hand-tuned heuristics with different levels of generality on one sid… ▽ More

    Submitted 10 October, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

  4. arXiv:2007.05078  [pdf, other

    cs.LG stat.ML

    A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

    Authors: Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

    Abstract: In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which qu… ▽ More

    Submitted 23 March, 2022; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Update following the publication in AISTATS 2021. Fixed typos and lemma about runtime

  5. arXiv:2006.06294  [pdf, other

    cs.LG stat.ML

    Adaptive Reward-Free Exploration

    Authors: Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

    Abstract: Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be… ▽ More

    Submitted 7 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

  6. arXiv:2006.05879  [pdf, other

    cs.LG stat.ML

    Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

    Authors: Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

    Abstract: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optima… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  7. arXiv:2004.05599  [pdf, other

    cs.LG stat.ML

    Kernel-Based Reinforcement Learning: A Finite-Time Analysis

    Authors: Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ epi… ▽ More

    Submitted 23 March, 2022; v1 submitted 12 April, 2020; originally announced April 2020.

    Comments: Update following the publication in ICML 2021, including fixed typos