Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Ziyin, Liu; Wang, Mingze; Li, Hongchao; Wu, Lei

Computer Science > Machine Learning

arXiv:2402.07193 (cs)

[Submitted on 11 Feb 2024 (v1), last revised 7 Nov 2024 (this version, v3)]

Title:Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Authors:Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu

View PDF HTML (experimental)

Abstract:Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries -- a broad subclass of continuous symmetries present in the model architecture or loss function -- interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a ``Noether flow") of the parameters $\theta$ along the degenerate direction to a unique initialization-independent fixed point $\theta^*$. These points are referred to as the {\it noise equilibria} because, at these points, noise contributions from different directions are balanced and aligned. Then, we show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks and have practical implications for understanding techniques like representation normalization and warmup.

Comments:	NeurIPS camera ready
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2402.07193 [cs.LG]
	(or arXiv:2402.07193v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.07193

Submission history

From: Liu Ziyin [view email]
[v1] Sun, 11 Feb 2024 13:00:04 UTC (1,106 KB)
[v2] Mon, 3 Jun 2024 17:49:41 UTC (1,404 KB)
[v3] Thu, 7 Nov 2024 02:39:30 UTC (1,899 KB)

Computer Science > Machine Learning

Title:Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators