A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Parmas, Paavo; Sugiyama, Masashi

Computer Science > Machine Learning

arXiv:1910.06419 (cs)

[Submitted on 14 Oct 2019]

Title:A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Authors:Paavo Parmas, Masashi Sugiyama

View PDF

Abstract:Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature. We use a first principles approach to explain LR and RP, and show a connection between the two via the divergence theorem. The theory motivated us to derive optimal importance sampling schemes to reduce LR gradient variance. Our newly derived distributions have analytic probability densities and can be directly sampled from. The improvement for Gaussian target distributions was modest, but for other distributions such as a Beta distribution, our method could lead to arbitrarily large improvements, and was crucial to obtain competitive performance in evolution strategies experiments.

Comments:	8 pages + 19 pages appendix. Preliminary work
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.06419 [cs.LG]
	(or arXiv:1910.06419v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.06419

Submission history

From: Paavo Parmas [view email]
[v1] Mon, 14 Oct 2019 20:59:13 UTC (4,485 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Paavo Parmas
Masashi Sugiyama

export BibTeX citation

Computer Science > Machine Learning

Title:A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators