Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Romoff, Joshua; Piché, Alexandre; Henderson, Peter; Francois-Lavet, Vincent; Pineau, Joelle

Computer Science > Machine Learning

arXiv:1805.03359v1 (cs)

[Submitted on 9 May 2018 (this version), latest version 7 Nov 2018 (v2)]

Title:Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Authors:Joshua Romoff, Alexandre Piché, Peter Henderson, Vincent Francois-Lavet, Joelle Pineau

View PDF

Abstract:In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and control-variates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal. This results in theoretical reductions in variance in the tabular case, as well as empirical improvements in both the function approximation and tabular settings in environments where rewards are stochastic. To do so, we use a modified version of Advantage Actor Critic (A2C) on variations of Atari games.

Comments:	Accepted to the International Conference on Learning Representations (ICLR) 2018 Workshop Track
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1805.03359 [cs.LG]
	(or arXiv:1805.03359v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.03359

Submission history

From: Peter Henderson [view email]
[v1] Wed, 9 May 2018 03:11:29 UTC (6,878 KB)
[v2] Wed, 7 Nov 2018 20:36:53 UTC (7,957 KB)

Computer Science > Machine Learning

Title:Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators