Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Barbier, Jean; Camilli, Francesco; Nguyen, Minh-Toan; Pastore, Mauro; Skerk, Rudy

Statistics > Machine Learning

arXiv:2501.18530 (stat)

[Submitted on 30 Jan 2025 (v1), last revised 1 Apr 2025 (this version, v2)]

Title:Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Authors:Jean Barbier, Francesco Camilli, Minh-Toan Nguyen, Mauro Pastore, Rudy Skerk

View PDF HTML (experimental)

Abstract:We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We provide an effective theory for approximating the Bayes-optimal generalisation error of the network for any activation function in the regime of sample size $n$ scaling quadratically with the input dimension, i.e., around the interpolation threshold where the number of trainable parameters $kd+k$ and of data $n$ are comparable. Our analysis tackles generic weight distributions. We uncover a discontinuous phase transition separating a "universal" phase from a "specialisation" phase. In the first, the generalisation error is independent of the weight distribution and decays slowly with the sampling rate $n/d^2$, with the student learning only some non-linear combinations of the teacher weights. In the latter, the error is weight distribution-dependent and decays faster due to the alignment of the student towards the teacher network. We thus unveil the existence of a highly predictive solution near interpolation, which is however potentially hard to find by practical algorithms.

Comments:	v2: 9 pages + appendix, 10 figures, 3 tables; added discussion on Gaussian inner weights (Fig. 2, 5 + Appendix H); added discussion on algorithmic complexity of specialisation (Appendix I and figures therein)
Subjects:	Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Machine Learning (cs.LG)
Cite as:	arXiv:2501.18530 [stat.ML]
	(or arXiv:2501.18530v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2501.18530

Submission history

From: Mauro Pastore [view email]
[v1] Thu, 30 Jan 2025 17:56:52 UTC (1,065 KB)
[v2] Tue, 1 Apr 2025 16:32:05 UTC (6,894 KB)

Statistics > Machine Learning

Title:Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators