Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Daley, Brett; Hickert, Cameron; Amato, Christopher

Computer Science > Machine Learning

arXiv:2102.11319 (cs)

[Submitted on 22 Feb 2021]

Title:Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Authors:Brett Daley, Cameron Hickert, Christopher Amato

View PDF

Abstract:Deep Reinforcement Learning (RL) methods rely on experience replay to approximate the minibatched supervised learning setting; however, unlike supervised learning where access to lots of training data is crucial to generalization, replay-based deep RL appears to struggle in the presence of extraneous data. Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large.
This suggests that outdated experiences somehow impact the performance of deep RL, which should not be the case for off-policy methods like DQN. Consequently, we re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation. We show that -- despite conventional wisdom -- sampling from the uniform distribution does not yield uncorrelated training samples and therefore biases gradients during training. Our theory prescribes a special non-uniform distribution to cancel this effect, and we propose a stratified sampling scheme to efficiently implement it.

Comments:	AAMAS 2021 Extended Abstract, 3 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2102.11319 [cs.LG]
	(or arXiv:2102.11319v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.11319

Submission history

From: Brett Daley [view email]
[v1] Mon, 22 Feb 2021 19:29:18 UTC (956 KB)

Full-text links:

Access Paper:

view license

Ancillary-file links:

Ancillary files (details):

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-02

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Brett Daley
Christopher Amato

export BibTeX citation

Computer Science > Machine Learning

Title:Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Submission history

Access Paper:

Ancillary files (details):

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Submission history

Access Paper:

Ancillary files (details):

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators