Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

Whitney, William F.; Bloesch, Michael; Springenberg, Jost Tobias; Abdolmaleki, Abbas; Cho, Kyunghyun; Riedmiller, Martin

Computer Science > Machine Learning

arXiv:2101.09458 (cs)

[Submitted on 23 Jan 2021 (v1), last revised 1 Jul 2021 (this version, v2)]

Title:Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

Authors:William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller

View PDF

Abstract:Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. This causes BBE to be actively detrimental to policy learning in many control tasks. We show that by decoupling the task policy from the exploration policy, directed exploration can be highly effective for sample-efficient continuous control. Our method, Decoupled Exploration and Exploitation Policies (DEEP), can be combined with any off-policy RL algorithm without modification. When used in conjunction with soft actor-critic, DEEP incurs no performance penalty in densely-rewarding environments. On sparse environments, DEEP gives a several-fold improvement in data efficiency due to better exploration.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2101.09458 [cs.LG]
	(or arXiv:2101.09458v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2101.09458

Submission history

From: William Whitney [view email]
[v1] Sat, 23 Jan 2021 08:51:04 UTC (184 KB)
[v2] Thu, 1 Jul 2021 16:03:55 UTC (2,970 KB)

Computer Science > Machine Learning

Title:Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators