Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

Tucat, Matteo; Mukherjee, Anirbit; Sen, Procheta; Sun, Mingfei; Rivasplata, Omar

Computer Science > Machine Learning

arXiv:2404.08624 (cs)

[Submitted on 12 Apr 2024 (v1), last revised 8 Apr 2025 (this version, v2)]

Title:Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

Authors:Matteo Tucat, Anirbit Mukherjee, Procheta Sen, Mingfei Sun, Omar Rivasplata

View PDF HTML (experimental)

Abstract:We present and analyze a novel regularized form of the gradient clipping algorithm, proving that it converges to global minima of the loss surface of deep neural networks under the squared loss, provided that the layers are of sufficient width. The algorithm presented here, dubbed $\delta-$GClip, introduces a modification to gradient clipping that leads to a first-of-its-kind example of a step size scheduling for gradient descent that provably minimizes training losses of deep neural nets. We also present empirical evidence that our theoretically founded $\delta-$GClip algorithm is competitive with the state-of-the-art deep learning heuristics on various neural architectures including modern transformer based architectures. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for sufficiently wide neural networks at any depth within a neighbourhood of the initialization.

Comments:	20 pages
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2404.08624 [cs.LG]
	(or arXiv:2404.08624v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.08624

Submission history

From: Anirbit Mukherjee [view email]
[v1] Fri, 12 Apr 2024 17:37:42 UTC (1,929 KB)
[v2] Tue, 8 Apr 2025 12:19:22 UTC (1,266 KB)

Computer Science > Machine Learning

Title:Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators