-
Asymptotics of feature learning in two-layer networks after one gradient-step
Authors:
Hugo Cui,
Luca Pesce,
Yatin Dandi,
Florent Krzakala,
Yue M. Lu,
Lenka Zdeborová,
Bruno Loureiro
Abstract:
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w…
▽ More
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error of the sRF in the high-dimensional limit where the number of samples, the width, and the input dimension grow at a proportional rate. The resulting characterization for sRFs also captures closely the learning curves of the original network model. This enables us to understand how adapting to the data is crucial for the network to efficiently learn non-linear functions in the direction of the gradient -- where at initialization it can only express linear functions in this regime.
△ Less
Submitted 4 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation
Authors:
Luca Pesce,
Florent Krzakala,
Bruno Loureiro,
Ludovic Stephan
Abstract:
In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we a…
▽ More
In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?". Our formula allow us to give sharp answers to this question, both in the positive and negative directions. More precisely, we show that the sufficient conditions for Gaussian universality (or lack of thereof) crucially depend on the alignment between the target weights and the means and covariances of the mixture clusters, which we precisely quantify. In the particular case of least-squares interpolation, we prove a strong universality property of the training error, and show it follows a simple, closed-form expression. Finally, we apply our results to real datasets, clarifying some recent discussion in the literature about Gaussian universality of the errors in this context.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Innate Dynamics and Identity Crisis of a Metal Surface Unveiled by Machine Learning of Atomic Environments
Authors:
Matteo Cioni,
Daniela Polino,
Daniele Rapetti,
Luca Pesce,
Massimo Delle Piane,
Giovanni M. Pavan
Abstract:
Metals are traditionally considered hard matter. However, it is well known that their atomic lattices may become dynamic and undergo reconfigurations even well-below the melting temperature. The innate atomic dynamics of metals is directly related to their bulk and surface properties. Understanding their complex structural dynamics is thus important for many applications but is not easy. Here we r…
▽ More
Metals are traditionally considered hard matter. However, it is well known that their atomic lattices may become dynamic and undergo reconfigurations even well-below the melting temperature. The innate atomic dynamics of metals is directly related to their bulk and surface properties. Understanding their complex structural dynamics is thus important for many applications but is not easy. Here we report deep-potential molecular dynamics simulations allowing to resolve at atomic-resolution the complex dynamics of various types of copper (Cu) surfaces, used as an example, near the Hüttig ($\sim1/3$ of melting) temperature. The development of a deep neural network potential trained on DFT calculations provides a dynamically-accurate force field that we use to simulate large atomistic models of different Cu surface types. A combination of high-dimensional structural descriptors and unsupervised machine learning allows identifying and tracking all the atomic environments (AEs) emerging in the surfaces at finite temperatures. We can directly observe how AEs that are non-native in a specific (ideal) surface, but that are instead typical of other surface types, continuously emerge/disappear in that surface in relevant regimes in dynamic equilibrium with the native ones. Our analyses allow estimating the lifetime of all the AEs populating these Cu surfaces and to reconstruct their dynamic interconversions networks. This reveals the elusive identity of these metal surfaces, which preserve their identity only in part and in part transform into something else in relevant conditions. This also proposes a concept of "statistical identity" for metal surfaces, which is key for understanding their behaviors and properties.
△ Less
Submitted 21 February, 2023; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap
Authors:
Luca Pesce,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r…
▽ More
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the ratio $α$ between the number of samples and the dimension are fixed, while the dimension diverges. We identify the information-theoretic threshold below which obtaining a positive correlation with the true cluster means is statistically impossible. Additionally, we investigate the performance of the approximate message passing (AMP) algorithm analyzed via its state evolution, which is conjectured to be optimal among polynomial algorithm for this task. We identify in particular the existence of a statistical-to-computational gap between the algorithm that require a signal-to-noise ratio $λ_{\text{alg}} \ge k / \sqrtα $ to perform better than random, and the information theoretic threshold at $λ_{\text{it}} \approx \sqrt{-k ρ\logρ} / \sqrtα$. Finally, we discuss the case of sub-extensive sparsity $ρ$ by comparing the performance of the AMP with other sparsity-enhancing algorithms, such as sparse-PCA and diagonal thresholding.
△ Less
Submitted 1 December, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.