A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Yan, Yan; Yang, Tianbao; Li, Zhe; Lin, Qihang; Yang, Yi

Computer Science > Machine Learning

arXiv:1808.10396 (cs)

[Submitted on 30 Aug 2018]

Title:A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Authors:Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang

View PDF

Abstract:Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

Comments:	Previous Technical Report: arXiv:1604.03257
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1808.10396 [cs.LG]
	(or arXiv:1808.10396v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1808.10396
Journal reference:	In IJCAI, pp. 2955-2961. 2018

Submission history

From: Yan Yan [view email]
[v1] Thu, 30 Aug 2018 17:00:03 UTC (449 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-08

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yan Yan
Tianbao Yang
Zhe Li
Qihang Lin
Yi Yang

export BibTeX citation

Computer Science > Machine Learning

Title:A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Unified Analysis of Stochastic Momentum Methods for Deep Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators