A Precise Characterization of SGD Stability Using Loss Surface Geometry

Dexter, Gregory; Ocejo, Borja; Keerthi, Sathiya; Gupta, Aman; Acharya, Ayan; Khanna, Rajiv

Computer Science > Machine Learning

arXiv:2401.12332 (cs)

[Submitted on 22 Jan 2024]

Title:A Precise Characterization of SGD Stability Using Loss Surface Geometry

Authors:Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

View PDF HTML (experimental)

Abstract:Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a stationary point as a predictive proxy for sharpness and generalization error in overparameterized neural networks (Wu et al., 2022; Jastrzebski et al., 2019; Cohen et al., 2021). In this paper, we delve deeper into the relationship between linear stability and sharpness. More specifically, we meticulously delineate the necessary and sufficient conditions for linear stability, contingent on hyperparameters of SGD and the sharpness at the optimum. Towards this end, we introduce a novel coherence measure of the loss Hessian that encapsulates pertinent geometric properties of the loss function that are relevant to the linear stability of SGD. It enables us to provide a simplified sufficient condition for identifying linear instability at an optimum. Notably, compared to previous works, our analysis relies on significantly milder assumptions and is applicable for a broader class of loss functions than known before, encompassing not only mean-squared error but also cross-entropy loss.

Comments:	To appear at ICLR 2024
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2401.12332 [cs.LG]
	(or arXiv:2401.12332v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.12332

Submission history

From: Gregory Dexter [view email]
[v1] Mon, 22 Jan 2024 19:46:30 UTC (547 KB)

Computer Science > Machine Learning

Title:A Precise Characterization of SGD Stability Using Loss Surface Geometry

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Precise Characterization of SGD Stability Using Loss Surface Geometry

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators