Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Montanari, Andrea; Urbani, Pierfrancesco

Statistics > Machine Learning

arXiv:2502.21269 (stat)

[Submitted on 28 Feb 2025 (v1), last revised 2 Sep 2025 (this version, v2)]

Title:Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Authors:Andrea Montanari, Pierfrancesco Urbani

View PDF HTML (experimental)

Abstract:Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width, the training dynamics exhibits a separation of timescales which implies: $(i)$ The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$ Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$ A dynamical decoupling between feature learning and overfitting regimes; $(iv)$ A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.

Comments:	85 pages; 62 pdf figures
Subjects:	Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
Cite as:	arXiv:2502.21269 [stat.ML]
	(or arXiv:2502.21269v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2502.21269

Submission history

From: Andrea Montanari [view email]
[v1] Fri, 28 Feb 2025 17:45:26 UTC (6,551 KB)
[v2] Tue, 2 Sep 2025 20:56:26 UTC (6,066 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat.ML

< prev | next >

new | recent | 2025-02

Change to browse by:

cond-mat
cond-mat.dis-nn
cs
cs.LG
stat

References & Citations

export BibTeX citation

Statistics > Machine Learning

Title:Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators