Rethinking Gauss-Newton for learning over-parameterized models

Arbel, Michael; Menegaux, Romain; Wolinski, Pierre

Computer Science > Machine Learning

arXiv:2302.02904 (cs)

[Submitted on 6 Feb 2023 (v1), last revised 12 Dec 2023 (this version, v3)]

Title:Rethinking Gauss-Newton for learning over-parameterized models

Authors:Michael Arbel, Romain Menegaux, Pierre Wolinski

View PDF HTML (experimental)

Abstract:This work studies the global convergence and implicit bias of Gauss Newton's (GN) when optimizing over-parameterized one-hidden layer networks in the mean-field regime. We first establish a global convergence result for GN in the continuous-time limit exhibiting a faster convergence rate compared to GD due to improved conditioning. We then perform an empirical study on a synthetic regression task to investigate the implicit bias of GN's method. While GN is consistently faster than GD in finding a global optimum, the learned model generalizes well on test data when starting from random initial weights with a small variance and using a small step size to slow down convergence. Specifically, our study shows that such a setting results in a hidden learning phenomenon, where the dynamics are able to recover features with good generalization properties despite the model having sub-optimal training and test performances due to an under-optimized linear layer. This study exhibits a trade-off between the convergence speed of GN and the generalization ability of the learned solution.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2302.02904 [cs.LG]
	(or arXiv:2302.02904v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.02904

Submission history

From: Michael Arbel [view email]
[v1] Mon, 6 Feb 2023 16:18:48 UTC (98 KB)
[v2] Mon, 5 Jun 2023 10:01:48 UTC (375 KB)
[v3] Tue, 12 Dec 2023 08:40:56 UTC (379 KB)

Computer Science > Machine Learning

Title:Rethinking Gauss-Newton for learning over-parameterized models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Rethinking Gauss-Newton for learning over-parameterized models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators