RMSProp and equilibrated adaptive learning rates for non-convex optimization

Dauphin, Yann N.; de Vries, Harm; Chung, Junyoung; Bengio, Yoshua

Computer Science > Machine Learning

arXiv:1502.04390v1 (cs)

[Submitted on 15 Feb 2015 (this version), latest version 29 Aug 2015 (v2)]

Title:RMSProp and equilibrated adaptive learning rates for non-convex optimization

Authors:Yann N. Dauphin, Harm de Vries, Junyoung Chung, Yoshua Bengio

View PDF

Abstract:Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us design better suited adaptive learning rate schemes, i.e., diagonal preconditioners. We show that the optimal preconditioner is based on taking the absolute value of the Hessian's eigenvalues, which is not what Newton and classical preconditioners like Jacobi's do. In this paper, we propose a novel adaptive learning rate scheme based on the equilibration preconditioner and show that RMSProp approximates it, which may explain some of its success in the presence of saddle points. Whereas RMSProp is a biased estimator of the equilibration preconditioner, the proposed stochastic estimator, ESGD, is unbiased and only adds a small percentage to computing time. We find that both schemes yield very similar step directions but that ESGD sometimes surpasses RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.

Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA)
Cite as:	arXiv:1502.04390 [cs.LG]
	(or arXiv:1502.04390v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1502.04390

Submission history

From: Yann Dauphin [view email]
[v1] Sun, 15 Feb 2015 23:41:33 UTC (433 KB)
[v2] Sat, 29 Aug 2015 23:04:39 UTC (456 KB)

Computer Science > Machine Learning

Title:RMSProp and equilibrated adaptive learning rates for non-convex optimization

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:RMSProp and equilibrated adaptive learning rates for non-convex optimization

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators