Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Graves, Eric; Ghiassian, Sina

Computer Science > Machine Learning

arXiv:2203.10172 (cs)

[Submitted on 18 Mar 2022 (v1), last revised 16 Jun 2022 (this version, v2)]

Title:Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Authors:Eric Graves, Sina Ghiassian

View PDF

Abstract:A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being executed. To account for the difference importance sampling ratios are often used, but can increase variance in the algorithms and reduce the rate of learning. Several variations of importance sampling have been proposed to reduce variance, with per-decision importance sampling being the most popular. However, the update rules for most off-policy algorithms in the literature depart from per-decision importance sampling in a subtle way; they correct the entire TD error instead of just the TD target. In this work, we show how this slight change can be interpreted as a control variate for the TD target, reducing variance and improving performance. Experiments over a wide range of algorithms show this subtle modification results in improved performance.

Comments:	5 pages, 2 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2203.10172 [cs.LG]
	(or arXiv:2203.10172v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2203.10172

Submission history

From: Eric Graves [view email]
[v1] Fri, 18 Mar 2022 21:54:09 UTC (821 KB)
[v2] Thu, 16 Jun 2022 19:54:42 UTC (821 KB)

Computer Science > Machine Learning

Title:Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators