Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Kong, Lingkai; Tao, Molei

Computer Science > Machine Learning

arXiv:2002.06189 (cs)

[Submitted on 14 Feb 2020 (v1), last revised 2 Nov 2020 (this version, v2)]

Title:Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Authors:Lingkai Kong, Molei Tao

View PDF

Abstract:This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, it shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution. Both theoretical and numerical demonstrations are provided, and the theoretical part relies on the construction of a stochastic map that uses bounded noise (as opposed to discretized diffusions).

Comments:	NeurIPS 2020. v1->v2: Weakened conditions needed for the theory. Added connections to neural network. Corrected typo
Subjects:	Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2002.06189 [cs.LG]
	(or arXiv:2002.06189v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2002.06189

Submission history

From: Lingkai Kong [view email]
[v1] Fri, 14 Feb 2020 18:59:20 UTC (2,322 KB)
[v2] Mon, 2 Nov 2020 16:37:14 UTC (2,557 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-02

Change to browse by:

cs
cs.NA
math
math.DS
math.NA
math.OC
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lingkai Kong
Molei Tao

export BibTeX citation

Computer Science > Machine Learning

Title:Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators