Per-decision Multi-step Temporal Difference Learning with Control Variates

De Asis, Kristopher; Sutton, Richard S.

Computer Science > Machine Learning

arXiv:1807.01830 (cs)

[Submitted on 5 Jul 2018]

Title:Per-decision Multi-step Temporal Difference Learning with Control Variates

Authors:Kristopher De Asis, Richard S. Sutton

View PDF

Abstract:Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address a bias-variance trade off between reliance on current estimates, which could be poor, and incorporating longer sampled reward sequences into the updates. Especially in the off-policy setting, where the agent aims to learn about a policy different from the one generating its behaviour, the variance in the updates can cause learning to diverge as the number of sampled rewards used in the estimates increases. In this paper, we introduce per-decision control variates for multi-step TD algorithms, and compare them to existing methods. Our results show that including the control variates can greatly improve performance on both on and off-policy multi-step temporal difference learning tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1807.01830 [cs.LG]
	(or arXiv:1807.01830v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1807.01830
Journal reference:	(2018). In Conference on Uncertainty in Artificial Intelligence. http://auai.org/uai2018/proceedings/papers/282.pdf

Submission history

From: Kristopher De Asis [view email]
[v1] Thu, 5 Jul 2018 02:34:40 UTC (457 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-07

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kristopher De Asis
Richard S. Sutton

export BibTeX citation

Computer Science > Machine Learning

Title:Per-decision Multi-step Temporal Difference Learning with Control Variates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Per-decision Multi-step Temporal Difference Learning with Control Variates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators