On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Li, Jian; Luo, Xuanyuan; Qiao, Mingda

Computer Science > Machine Learning

arXiv:1902.00621 (cs)

[Submitted on 2 Feb 2019 (v1), last revised 29 Feb 2020 (this version, v4)]

Title:On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Authors:Jian Li, Xuanyuan Luo, Mingda Qiao

View PDF

Abstract:Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning theory. In this paper, we obtain generalization error bounds for learning general non-convex objectives, which has attracted significant attention in recent years. We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability. Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additional \ell_2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noisy level of the process is not small, and do not become vacuous even when T tends to infinity.

Comments:	Published in ICLR 2020
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1902.00621 [cs.LG]
	(or arXiv:1902.00621v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1902.00621

Submission history

From: Xuanyuan Luo [view email]
[v1] Sat, 2 Feb 2019 02:15:25 UTC (84 KB)
[v2] Fri, 24 May 2019 10:17:13 UTC (210 KB)
[v3] Mon, 10 Feb 2020 02:30:33 UTC (898 KB)
[v4] Sat, 29 Feb 2020 03:36:22 UTC (898 KB)

Computer Science > Machine Learning

Title:On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators