Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Kothapalli, Vignesh; Pang, Tianyu; Deng, Shenyang; Liu, Zongmin; Yang, Yaoqing

Computer Science > Machine Learning

arXiv:2406.04657 (cs)

[Submitted on 7 Jun 2024 (v1), last revised 2 Oct 2024 (this version, v2)]

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Authors:Vignesh Kothapalli, Tianyu Pang, Shenyang Deng, Zongmin Liu, Yaoqing Yang

View PDF

Abstract:Training strategies for modern deep neural networks (NNs) tend to induce a heavy-tailed (HT) empirical spectral density (ESD) in the layer weights. While previous efforts have shown that the HT phenomenon correlates with good generalization in large NNs, a theoretical explanation of its occurrence is still lacking. Especially, understanding the conditions which lead to this phenomenon can shed light on the interplay between generalization and weight spectra. Our work aims to bridge this gap by presenting a simple, rich setting to model the emergence of HT ESD. In particular, we present a theory-informed analysis for 'crafting' heavy tails in the ESD of two-layer NNs without any gradient noise. This is the first work to analyze a noise-free setting and incorporate optimizer (GD/Adam) dependent (large) learning rates into the HT ESD analysis. Our results highlight the role of learning rates on the Bulk+Spike and HT shape of the ESDs in the early phase of training, which can facilitate generalization in the two-layer NN. These observations shed light on the behavior of large-scale NNs, albeit in a much simpler setting. Last but not least, we present a novel perspective on the ESD evolution dynamics by analyzing the singular vectors of weight matrices and optimizer updates.

Comments:	34 pages, 32 figures, 4 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2406.04657 [cs.LG]
	(or arXiv:2406.04657v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.04657

Submission history

From: Vignesh Kothapalli [view email]
[v1] Fri, 7 Jun 2024 05:51:57 UTC (4,907 KB)
[v2] Wed, 2 Oct 2024 08:10:29 UTC (7,369 KB)

Computer Science > Machine Learning

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators