Adaptive Reward-Free Exploration

Kaufmann, Emilie; Ménard, Pierre; Domingues, Omar Darwiche; Jonsson, Anders; Leurent, Edouard; Valko, Michal

Computer Science > Machine Learning

arXiv:2006.06294 (cs)

[Submitted on 11 Jun 2020 (v1), last revised 7 Oct 2020 (this version, v2)]

Title:Adaptive Reward-Free Exploration

Authors:Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

View PDF

Abstract:Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs of order $({SAH^4}/{\varepsilon^2})(\log(1/\delta) + S)$ episodes to output, with probability $1-\delta$, an $\varepsilon$-approximation of the optimal policy for any reward function. This bound improves over existing sample-complexity bounds in both the small $\varepsilon$ and the small $\delta$ regimes. We further investigate the relative complexities of reward-free exploration and best-policy identification.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.06294 [cs.LG]
	(or arXiv:2006.06294v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.06294

Submission history

From: Edouard Leurent [view email]
[v1] Thu, 11 Jun 2020 09:58:03 UTC (1,449 KB)
[v2] Wed, 7 Oct 2020 16:23:09 UTC (1,178 KB)

Computer Science > Machine Learning

Title:Adaptive Reward-Free Exploration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Reward-Free Exploration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators