-
The maximum-average subtensor problem: equilibrium and out-of-equilibrium properties
Authors:
Vittorio Erba,
Nathan Malo Kupferschmid,
Rodrigo Pérez Ortiz,
Lenka Zdeborová
Abstract:
In this paper we introduce and study the Maximum-Average Subtensor ($p$-MAS) problem, in which one wants to find a subtensor of size $k$ of a given random tensor of size $N$, both of order $p$, with maximum sum of entries. We are motivated by recent work on the matrix case of the problem in which several equilibrium and non-equilibrium properties have been characterized analytically in the asympto…
▽ More
In this paper we introduce and study the Maximum-Average Subtensor ($p$-MAS) problem, in which one wants to find a subtensor of size $k$ of a given random tensor of size $N$, both of order $p$, with maximum sum of entries. We are motivated by recent work on the matrix case of the problem in which several equilibrium and non-equilibrium properties have been characterized analytically in the asymptotic regime $1 \ll k \ll N$, and a puzzling phenomenon was observed involving the coexistence of a clustered equilibrium phase and an efficient algorithm which produces submatrices in this phase. Here we extend previous results on equilibrium and algorithmic properties for the matrix case to the tensor case. We show that the tensor case has a similar equilibrium phase diagram as the matrix case, and an overall similar phenomenology for the considered algorithms. Additionally, we consider out-of-equilibrium landscape properties using Overlap Gap Properties and Franz-Parisi analysis, and discuss the implications or lack-thereof for average-case algorithmic hardness.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
On the existence of consistent adversarial attacks in high-dimensional linear classification
Authors:
Matteo Vilucchio,
Lenka Zdeborová,
Bruno Loureiro
Abstract:
What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying mode…
▽ More
What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Sequential Dynamics in Ising Spin Glasses
Authors:
Yatin Dandi,
David Gamarnik,
Francisco Pernice,
Lenka Zdeborová
Abstract:
We present the first exact asymptotic characterization of sequential dynamics for a broad class of local update algorithms on the Sherrington-Kirkpatrick (SK) model with Ising spins. Focusing on dynamics implemented via systematic scan -- encompassing Glauber updates at any temperature -- we analyze the regime where the number of spin updates scales linearly with system size. Our main result provi…
▽ More
We present the first exact asymptotic characterization of sequential dynamics for a broad class of local update algorithms on the Sherrington-Kirkpatrick (SK) model with Ising spins. Focusing on dynamics implemented via systematic scan -- encompassing Glauber updates at any temperature -- we analyze the regime where the number of spin updates scales linearly with system size. Our main result provides a description of the spin-field trajectories as the unique solution to a system of integro-difference equations derived via Dynamical Mean Field Theory (DMFT) applied to a novel block approximation. This framework captures the time evolution of macroscopic observables such as energy and overlap, and is numerically tractable. Our equations serve as a discrete-spin sequential-update analogue of the celebrated Cugliandolo-Kurchan equations for spherical spin glasses, resolving a long-standing gap in the theory of Ising spin glass dynamics. Beyond their intrinsic theoretical interest, our results establish a foundation for analyzing a wide variety of asynchronous dynamics on the hypercube and offer new avenues for studying algorithmic limitations of local heuristics in disordered systems.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model
Authors:
Hugo Tabanelli,
Pierre Mergny,
Lenka Zdeborova,
Florent Krzakala
Abstract:
We study the recovery of multiple high-dimensional signals from two noisy, correlated modalities: a spiked matrix and a spiked tensor sharing a common low-rank structure. This setting generalizes classical spiked matrix and tensor models, unveiling intricate interactions between inference channels and surprising algorithmic behaviors. Notably, while the spiked tensor model is typically intractable…
▽ More
We study the recovery of multiple high-dimensional signals from two noisy, correlated modalities: a spiked matrix and a spiked tensor sharing a common low-rank structure. This setting generalizes classical spiked matrix and tensor models, unveiling intricate interactions between inference channels and surprising algorithmic behaviors. Notably, while the spiked tensor model is typically intractable at low signal-to-noise ratios, its correlation with the matrix enables efficient recovery via Bayesian Approximate Message Passing, inducing staircase-like phase transitions reminiscent of neural network phenomena. In contrast, empirical risk minimization for joint learning fails: the tensor component obstructs effective matrix recovery, and joint optimization significantly degrades performance, highlighting the limitations of naive multi-modal learning. We show that a simple Sequential Curriculum Learning strategy-first recovering the matrix, then leveraging it to guide tensor recovery-resolves this bottleneck and achieves optimal weak recovery thresholds. This strategy, implementable with spectral methods, emphasizes the critical role of structural correlation and learning order in multi-modal high-dimensional inference.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Authors:
Luca Arnaboldi,
Bruno Loureiro,
Ludovic Stephan,
Florent Krzakala,
Lenka Zdeborova
Abstract:
We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expressi…
▽ More
We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expression for the population loss in terms of a pair of sufficient statistics capturing semantic and positional alignment, and characterize the induced high-dimensional SGD dynamics for these coordinates. Our analysis reveals two distinct training phases: escape from uninformative initialization and alignment with the target subspace, and demonstrates how the sequence length and positional encoding influence convergence speed and learning trajectories. These results provide a rigorous and interpretable foundation for understanding how sequential structure in data can be beneficial for learning with attention-based models.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Bayes optimal learning of attention-indexed models
Authors:
Fabrizio Boncoraglio,
Emanuele Troiani,
Vittorio Erba,
Lenka Zdeborová
Abstract:
We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transforme…
▽ More
We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in modern attention architectures.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions
Authors:
Yizhou Xu,
Florent Krzakala,
Lenka Zdeborová
Abstract:
The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a con…
▽ More
The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBM on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition, in the spiked covariance model.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
Authors:
Vittorio Erba,
Emanuele Troiani,
Lenka Zdeborová,
Florent Krzakala
Abstract:
We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $\ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in suc…
▽ More
We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $\ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in such networks emerges from a low-rank structure in the learned feature maps. Our results characterize the global minima of the loss and yield precise generalization thresholds, showing how the width of the target function governs learnability. This analysis bridges and extends ideas from spin-glass methods, matrix factorization, and convex optimization and emphasizes the deep link between low-rank matrix sensing and learning in quadratic neural networks.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications
Authors:
Yizhou Xu,
Antoine Maillard,
Lenka Zdeborová,
Florent Krzakala
Abstract:
In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank,…
▽ More
In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank, e.g. a product of two matrices of sizes proportional to the dimension. We provide rigorous asymptotic equations characterizing the Bayes-optimal learning performance from a number of samples which is proportional to the number of entries in the matrix. Our proof is composed of three key ingredients: $(i)$ we prove universality properties to handle structured sensing matrices, related to the ''Gaussian equivalence'' phenomenon in statistical learning, $(ii)$ we provide a sharp characterization of Bayes-optimal learning in generalized linear models with Gaussian data and structured matrix priors, generalizing previously studied settings, and $(iii)$ we leverage previous works on the problem of matrix denoising. The generality of our results allow for a variety of applications: notably, we mathematically establish predictions obtained via non-rigorous methods from statistical physics in [ETB+24] regarding Bilinear Sequence Regression, a benchmark model for learning from sequences of tokens, and in [MTM+24] on Bayes-optimal learning in neural networks with quadratic activation function, and width proportional to the dimension.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model
Authors:
O. Duranthon,
L. Zdeborová
Abstract:
Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-calle…
▽ More
Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-called oversmoothing; and overcoming it is one of the practically motivated challenges. We consider the situation where information is aggregated by multiple steps of convolution, leading to graph convolutional networks (GCNs). We analyze the generalization performance of a basic GCN, trained for node classification on data generated by the contextual stochastic block model. We predict its asymptotic performance by deriving the free energy of the problem, using the replica method, in the high-dimensional limit. Calling depth the number of convolutional steps, we show the importance of going to large depth to approach the Bayes-optimality. We detail how the architecture of the GCN has to scale with the depth to avoid oversmoothing. The resulting large depth limit can be close to the Bayes-optimality and leads to a continuous GCN. Technically, we tackle this continuous limit via an approach that resembles dynamical mean-field theory (DMFT) with constraints at the initial and final times. An expansion around large regularization allows us to solve the corresponding equations for the performance of the deep GCN. This promising tool may contribute to the analysis of further deep neural networks.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds
Authors:
Emanuele Troiani,
Hugo Cui,
Yatin Dandi,
Florent Krzakala,
Lenka Zdeborová
Abstract:
In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of…
▽ More
In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of Bayesian-optimal learning, in the limit of large dimension $D$ and commensurably large number of samples $N$, we derive a sharp asymptotic characterization of the optimal performance as well as the performance of the best-known polynomial-time algorithm for this setting --namely approximate message-passing--, and characterize sharp thresholds on the minimal sample complexity required for better-than-random prediction performance. Our analysis uncovers, in particular, how the different layers are learned sequentially. Finally, we discuss how this sequential learning can also be observed in a realistic setup.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Dynamical Cavity Method for Hypergraphs and its Application to Quenches in the k-XOR-SAT Problem
Authors:
Aude Maier,
Freya Behrens,
Lenka Zdeborová
Abstract:
The dynamical cavity method and its backtracking version provide a powerful approach to studying the properties of dynamical processes on large random graphs. This paper extends these methods to hypergraphs, enabling the analysis of interactions involving more than two variables. We apply them to analyse the $k$-XOR-satisfiability ($k$-XOR-SAT) problem, an important model in theoretical computer s…
▽ More
The dynamical cavity method and its backtracking version provide a powerful approach to studying the properties of dynamical processes on large random graphs. This paper extends these methods to hypergraphs, enabling the analysis of interactions involving more than two variables. We apply them to analyse the $k$-XOR-satisfiability ($k$-XOR-SAT) problem, an important model in theoretical computer science which is closely related to the diluted $p$-spin model from statistical physics. In particular, we examine whether the quench dynamics -- a deterministic, locally greedy process -- can find solutions with only a few violated constraints on $d$-regular $k$-uniform hypergraphs. Our results demonstrate that the methods accurately characterize the attractors of the dynamics. It enables us to compute the energy reached by typical trajectories of the dynamical process in different parameter regimes. We show that these predictions are accurate, including cases where a classical mean-field approach fails.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens
Authors:
Vittorio Erba,
Emanuele Troiani,
Luca Biggio,
Antoine Maillard,
Lenka Zdeborová
Abstract:
Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physic…
▽ More
Current progress in artificial intelligence is centered around so-called large language models that consist of neural networks processing long sequences of high-dimensional vectors called tokens. Statistical physics provides powerful tools to study the functioning of learning with neural networks and has played a recognized role in the development of modern machine learning. The statistical physics approach relies on simplified and analytically tractable models of data. However, simple tractable models for long sequences of high-dimensional tokens are largely underexplored. Inspired by the crucial role models such as the single-layer teacher-student perceptron (aka generalized linear regression) played in the theory of fully connected neural networks, in this paper, we introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens. We note that modern architectures naturally subsume the BSR model due to the skip connections. Building on recent methodological progress, we compute the Bayes-optimal generalization error for the model in the limit of long sequences of high-dimensional tokens, and provide a message-passing algorithm that matches this performance. We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression. We also unveil surprising properties of the gradient descent algorithms in the BSR model.
△ Less
Submitted 21 May, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Building Conformal Prediction Intervals with Approximate Message Passing
Authors:
Lucas Clarté,
Lenka Zdeborová
Abstract:
Conformal prediction has emerged as a powerful tool for building prediction intervals that are valid in a distribution-free way. However, its evaluation may be computationally costly, especially in the high-dimensional setting where the dimensionality and sample sizes are both large and of comparable magnitudes. To address this challenge in the context of generalized linear regression, we propose…
▽ More
Conformal prediction has emerged as a powerful tool for building prediction intervals that are valid in a distribution-free way. However, its evaluation may be computationally costly, especially in the high-dimensional setting where the dimensionality and sample sizes are both large and of comparable magnitudes. To address this challenge in the context of generalized linear regression, we propose a novel algorithm based on Approximate Message Passing (AMP) to accelerate the computation of prediction intervals using full conformal prediction, by approximating the computation of conformity scores. Our work bridges a gap between modern uncertainty quantification techniques and tools for high-dimensional problems involving the AMP algorithm. We evaluate our method on both synthetic and real data, and show that it produces prediction intervals that are close to the baseline methods, while being orders of magnitude faster. Additionally, in the high-dimensional limit and under assumptions on the data distribution, the conformity scores computed by AMP converge to the one computed exactly, which allows theoretical study and benchmarking of conformal methods in high dimensions.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
The phase diagram of compressed sensing with $\ell_0$-norm regularization
Authors:
Damien Barbier,
Carlo Lucibello,
Luca Saglietti,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Noiseless compressive sensing is a two-steps setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on $\lone$ regularization, provides an efficient and robust to address this problem, but it fails in the regime of very high compression rate. Here we present two algorithms based on $\lzero$-norm regularization instea…
▽ More
Noiseless compressive sensing is a two-steps setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on $\lone$ regularization, provides an efficient and robust to address this problem, but it fails in the regime of very high compression rate. Here we present two algorithms based on $\lzero$-norm regularization instead that outperform the LASSO in terms of compression rate in the Gaussian design setting for measurement matrix. These algorithms are based on the Approximate Survey Propagation, an algorithmic family within the Approximate Message Passing class. In the large system limit, they can be rigorously tracked through State Evolution equations and it is possible to exactly predict the range compression rates for which perfect signal reconstruction is possible. We also provide a statistical physics analysis of the $\lzero$-norm noiseless compressive sensing model. We show the existence of both a replica symmetric state and a 1-step replica symmmetry broken (1RSB) state for sufficiently low $\lzero$-norm regularization. The recovery limits of our algorithms are linked to the behavior of the 1RSB solution.
△ Less
Submitted 22 August, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Bayes-optimal learning of an extensive-width neural network from quadratically many samples
Authors:
Antoine Maillard,
Emanuele Troiani,
Simon Martin,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension and the network width are proportionally large. Recent work [Cui & al '23] established that linear regression provides Bayes-optimal test error to learn…
▽ More
We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension and the network width are proportionally large. Recent work [Cui & al '23] established that linear regression provides Bayes-optimal test error to learn such a function when the number of available samples is only linear in the dimension. That work stressed the open challenge of theoretically analyzing the optimal test error in the more interesting regime where the number of samples is quadratic in the dimension. In this paper, we solve this challenge for quadratic activations and derive a closed-form expression for the Bayes-optimal test error. We also provide an algorithm, that we call GAMP-RIE, which combines approximate message passing with rotationally invariant matrix denoising, and that asymptotically achieves the optimal performance. Technically, our result is enabled by establishing a link with recent works on optimal denoising of extensive-rank matrices and on the ellipsoid fitting problem. We further show empirically that, in the absence of noise, randomly-initialized gradient descent seems to sample the space of weights, leading to zero training loss, and averaging over initialization leads to a test error equal to the Bayes-optimal one.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions
Authors:
Christian Keup,
Lenka Zdeborová
Abstract:
This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper…
▽ More
This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution. The analysis holds for a broad range of priors and noise channels, which can differ across modalities. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods, which are both observed to suffer from a sub-optimal recovery threshold.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Counting and Hardness-of-Finding Fixed Points in Cellular Automata on Random Graphs
Authors:
Cédric Koller,
Freya Behrens,
Lenka Zdeborová
Abstract:
We study the fixed points of outer-totalistic cellular automata on sparse random regular graphs. These can be seen as constraint satisfaction problems, where each variable must adhere to the same local constraint, which depends solely on its state and the total number of its neighbors in each possible state. Examples of this setting include classical problems such as independent sets or assortativ…
▽ More
We study the fixed points of outer-totalistic cellular automata on sparse random regular graphs. These can be seen as constraint satisfaction problems, where each variable must adhere to the same local constraint, which depends solely on its state and the total number of its neighbors in each possible state. Examples of this setting include classical problems such as independent sets or assortative/dissasortative partitions. We analyse the existence and number of fixed points in the large system limit using the cavity method, under both the replica symmetric (RS) and one-step replica symmetry breaking (1RSB) assumption. This method allows us to characterize the structure of the space of solutions, in particular, if the solutions are clustered and whether the clusters contain frozen variables. This last property is conjectured to be linked to the typical algorithmic hardness of the problem. We bring experimental evidence for this claim by studying the performance of the belief-propagation reinforcement algorithm, a message-passing-based solver for these constraint satisfaction problems.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Fundamental computational limits of weak learnability in high-dimensional multi-index models
Authors:
Emanuele Troiani,
Yatin Dandi,
Leonardo Defilippis,
Lenka Zdeborová,
Bruno Loureiro,
Florent Krzakala
Abstract:
Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dim…
▽ More
Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dimensional structure with first-order iterative algorithms, in the high-dimensional regime where the number of samples $n\!=\!αd$ is proportional to the covariate dimension $d$. Our findings unfold in three parts: (i) we identify under which conditions a trivial subspace can be learned with a single step of a first-order algorithm for any $α\!>\!0$; (ii) if the trivial subspace is empty, we provide necessary and sufficient conditions for the existence of an easy subspace where directions that can be learned only above a certain sample complexity $α\!>\!α_c$, where $α_{c}$ marks a computational phase transition. In a limited but interesting set of really hard directions -- akin to the parity problem -- $α_c$ is found to diverge. Finally, (iii) we show that interactions between different directions can result in an intricate hierarchical learning phenomenon, where directions can be learned sequentially when coupled to easier ones. We discuss in detail the grand staircase picture associated to these functions (and contrast it with the original staircase one). Our theory builds on the optimality of approximate message-passing among first-order iterative methods, delineating the fundamental learnability limit across a broad spectrum of algorithms, including neural networks trained with gradient descent, which we discuss in this context.
△ Less
Submitted 2 April, 2025; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Integer Traffic Assignment Problem: Algorithms and Insights on Random Graphs
Authors:
Rayan Harfouche,
Giovanni Piccioli,
Lenka Zdeborová
Abstract:
Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for…
▽ More
Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for commuters in a city represented by a graph, aiming to minimize congestion while adhering to integer flow constraints on paths. This restriction makes ITAP an NP-hard problem. While conventional TAP prioritizes repulsive interactions to minimize congestion, this work also explores the case of attractive interactions, related to minimizing the number of occupied edges. We present and evaluate multiple algorithms to address ITAP, including a message passing algorithm, a greedy approach, simulated annealing, and relaxation of ITAP to TAP. Inspired by studies of random ensembles in the large-size limit in statistical physics, comparisons between these algorithms are conducted on large sparse random regular graphs with a random set of origin-destination pairs. Our results indicate that while the simplest greedy algorithm performs competitively in the repulsive scenario, in the attractive case the message-passing-based algorithm and simulated annealing demonstrate superiority. We then investigate the relationship between TAP and ITAP in the repulsive case. We find that, as the number of paths increases, the solution of TAP converges toward that of ITAP, and we investigate the speed of this convergence. Depending on the number of paths, our analysis leads us to identify two scaling regimes: in one the average flow per edge is of order one, and in another the number of paths scales quadratically with the size of the graph, in which case the continuous relaxation solves the integer problem closely.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Quenches in the Sherrington-Kirkpatrick model
Authors:
Vittorio Erba,
Freya Behrens,
Florent Krzakala,
Lenka Zdeborová
Abstract:
The Sherrington-Kirkpatrick (SK) model is a prototype of a complex non-convex energy landscape. Dynamical processes evolving on such landscapes and locally aiming to reach minima are generally poorly understood. Here, we study quenches, i.e. dynamics that locally aim to decrease energy. We analyse the energy at convergence for two distinct algorithmic classes, single-spin flip and synchronous dyna…
▽ More
The Sherrington-Kirkpatrick (SK) model is a prototype of a complex non-convex energy landscape. Dynamical processes evolving on such landscapes and locally aiming to reach minima are generally poorly understood. Here, we study quenches, i.e. dynamics that locally aim to decrease energy. We analyse the energy at convergence for two distinct algorithmic classes, single-spin flip and synchronous dynamics, focusing on greedy and reluctant strategies. We provide precise numerical analysis of the finite size effects and conclude that, perhaps counter-intuitively, the reluctant algorithm is compatible with converging to the ground state energy density, while the greedy strategy is not. Inspired by the single-spin reluctant and greedy algorithms, we investigate two synchronous time algorithms, the sync-greedy and sync-reluctant algorithms. These synchronous processes can be analysed using dynamical mean field theory (DMFT), and a new backtracking version of DMFT. Notably, this is the first time the backtracking DMFT is applied to study dynamical convergence properties in fully connected disordered models. The analysis suggests that the sync-greedy algorithm can also achieve energies compatible with the ground state, and that it undergoes a dynamical phase transition.
△ Less
Submitted 17 July, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression
Authors:
Lucas Clarté,
Adrien Vandenbroucque,
Guillaume Dalle,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, ta…
▽ More
We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples $n$ and dimension $d$ of the covariates grow at a comparable fixed rate $α\!=\! n/d$. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when $α$ is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime $α\!<\!1$ relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
△ Less
Submitted 1 November, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Asymptotics of feature learning in two-layer networks after one gradient-step
Authors:
Hugo Cui,
Luca Pesce,
Yatin Dandi,
Florent Krzakala,
Yue M. Lu,
Lenka Zdeborová,
Bruno Loureiro
Abstract:
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w…
▽ More
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error of the sRF in the high-dimensional limit where the number of samples, the width, and the input dimension grow at a proportional rate. The resulting characterization for sRFs also captures closely the learning curves of the original network model. This enables us to understand how adapting to the data is crucial for the network to efficiently learn non-linear functions in the direction of the gradient -- where at initialization it can only express linear functions in this regime.
△ Less
Submitted 4 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Asymptotic generalization error of a single-layer graph convolutional network
Authors:
O. Duranthon,
L. Zdeborová
Abstract:
While graph convolutional networks show great practical promises, the theoretical understanding of their generalization properties as a function of the number of samples is still in its infancy compared to the more broadly studied case of supervised fully connected neural networks. In this article, we predict the performances of a single-layer graph convolutional network (GCN) trained on data prod…
▽ More
While graph convolutional networks show great practical promises, the theoretical understanding of their generalization properties as a function of the number of samples is still in its infancy compared to the more broadly studied case of supervised fully connected neural networks. In this article, we predict the performances of a single-layer graph convolutional network (GCN) trained on data produced by attributed stochastic block models (SBMs) in the high-dimensional limit. Previously, only ridge regression on contextual-SBM (CSBM) has been considered in Shi et al. 2022; we generalize the analysis to arbitrary convex loss and regularization for the CSBM and add the analysis for another data model, the neural-prior SBM. We also study the high signal-to-noise ratio limit, detail the convergence rates of the GCN and show that, while consistent, it does not reach the Bayes-optimal rate for any of the considered cases.
△ Less
Submitted 21 November, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Dynamical Phase Transitions in Graph Cellular Automata
Authors:
Freya Behrens,
Barbora Hudcová,
Lenka Zdeborová
Abstract:
Discrete dynamical systems can exhibit complex behaviour from the iterative application of straightforward local rules. A famous example are cellular automata whose global dynamics are notoriously challenging to analyze. To address this, we relax the regular connectivity grid of cellular automata to a random graph, which gives the class of graph cellular automata. Using the dynamical cavity method…
▽ More
Discrete dynamical systems can exhibit complex behaviour from the iterative application of straightforward local rules. A famous example are cellular automata whose global dynamics are notoriously challenging to analyze. To address this, we relax the regular connectivity grid of cellular automata to a random graph, which gives the class of graph cellular automata. Using the dynamical cavity method (DCM) and its backtracking version (BDCM), we show that this relaxation allows us to derive asymptotically exact analytical results on the global dynamics of these systems on sparse random graphs. Concretely, we showcase the results on a specific subclass of graph cellular automata with ``conforming non-conformist'' update rules, which exhibit dynamics akin to opinion formation. Such rules update a node's state according to the majority within their own neighbourhood. In cases where the majority leads only by a small margin over the minority, nodes may exhibit non-conformist behaviour. Instead of following the majority, they either maintain their own state, switch it, or follow the minority. For configurations with different initial biases towards one state we identify sharp dynamical phase transitions in terms of the convergence speed and attractor types. From the perspective of opinion dynamics this answers when consensus will emerge and when two opinions coexist almost indefinitely.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
On the Atypical Solutions of the Symmetric Binary Perceptron
Authors:
Damien Barbier,
Ahmed El Alaoui,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz…
▽ More
We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz-Parisi potential, which we rigorously evaluate using the first and second moment methods in the limit of a small constraint density $α$ (corresponding to vanishing margin $κ$) under a certain assumption on the concentration of the entropy. This examination unveils several intriguing phenomena: i) We demonstrate that these clusters have an entropic barrier in the sense that the entropy as a function of the distance from the reference high-margin solution is non-monotone when $κ\le 1.429 \sqrt{-α/\logα}$, while it is monotone otherwise, and that they have an energetic barrier in the sense that there are no solutions at an intermediate distance from the reference solution when $κ\le 1.239 \sqrt{-α/ \logα}$. The critical scaling of the margin $κ$ in $\sqrt{-α/\logα}$ corresponds to the one obtained from the earlier work of Gamarnik et al. (2022) for the overlap-gap property, a phenomenon known to present a barrier to certain efficient algorithms. ii) We establish using the replica method that the complexity (the logarithm of the number of clusters of such solutions) versus entropy (the logarithm of the number of solutions in the clusters) curves are partly non-concave and correspond to very large values of the Parisi parameter, with the equilibrium being reached when the Parisi parameter diverges.
△ Less
Submitted 28 June, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective
Authors:
Davide Ghio,
Yatin Dandi,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical analysis of the performance and understanding of the limitations of these methods remain, however, challenging. In this paper, we undertake a step in…
▽ More
Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical analysis of the performance and understanding of the limitations of these methods remain, however, challenging. In this paper, we undertake a step in this direction by analysing the efficiency of sampling by these methods on a class of problems with a known probability distribution and comparing it with the sampling performance of more traditional methods such as the Monte Carlo Markov chain and Langevin dynamics. We focus on a class of probability distribution widely studied in the statistical physics of disordered systems that relate to spin glasses, statistical inference and constraint satisfaction problems.
We leverage the fact that sampling via flow-based, diffusion-based or autoregressive networks methods can be equivalently mapped to the analysis of a Bayes optimal denoising of a modified probability measure. Our findings demonstrate that these methods encounter difficulties in sampling stemming from the presence of a first-order phase transition along the algorithm's denoising path. Our conclusions go both ways: we identify regions of parameters where these methods are unable to sample efficiently, while that is possible using standard Monte Carlo or Langevin approaches. We also identify regions where the opposite happens: standard approaches are inefficient while the discussed generative methods work well.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
High-dimensional Asymptotics of Denoising Autoencoders
Authors:
Hugo Cui,
Lenka Zdeborová
Abstract:
We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error.…
▽ More
We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results accurately capture the learning curves on a range of real data sets.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Compressed sensing with l0-norm: statistical physics analysis and algorithms for signal recovery
Authors:
D. Barbier,
C Lucibello,
L. Saglietti,
F. Krzakala,
L. Zdeborova
Abstract:
Noiseless compressive sensing is a protocol that enables undersampling and later recovery of a signal without loss of information. This compression is possible because the signal is usually sufficiently sparse in a given basis. Currently, the algorithm offering the best tradeoff between compression rate, robustness, and speed for compressive sensing is the LASSO (l1-norm bias) algorithm. However,…
▽ More
Noiseless compressive sensing is a protocol that enables undersampling and later recovery of a signal without loss of information. This compression is possible because the signal is usually sufficiently sparse in a given basis. Currently, the algorithm offering the best tradeoff between compression rate, robustness, and speed for compressive sensing is the LASSO (l1-norm bias) algorithm. However, many studies have pointed out the possibility that the implementation of lp-norms biases, with p smaller than one, could give better performance while sacrificing convexity. In this work, we focus specifically on the extreme case of the l0-based reconstruction, a task that is complicated by the discontinuity of the loss. In the first part of the paper, we describe via statistical physics methods, and in particular the replica method, how the solutions to this optimization problem are arranged in a clustered structure. We observe two distinct regimes: one at low compression rate where the signal can be recovered exactly, and one at high compression rate where the signal cannot be recovered accurately. In the second part, we present two message-passing algorithms based on our first results for the l0-norm optimization problem. The proposed algorithms are able to recover the signal at compression rates higher than the ones achieved by LASSO while being computationally efficient.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Bayes-optimal inference for spreading processes on random networks
Authors:
D. Ghio,
A. L. M. Aragon,
I. Biazzo,
L. Zdeborova
Abstract:
We consider a class of spreading processes on networks, which generalize commonly used epidemic models such as the SIR model or the SIS model with a bounded number of re-infections. We analyse the related problem of inference of the dynamics based on its partial observations. We analyse these inference problems on random networks via a message-passing inference algorithm derived from the Belief Pr…
▽ More
We consider a class of spreading processes on networks, which generalize commonly used epidemic models such as the SIR model or the SIS model with a bounded number of re-infections. We analyse the related problem of inference of the dynamics based on its partial observations. We analyse these inference problems on random networks via a message-passing inference algorithm derived from the Belief Propagation (BP) equations. We investigate whether said algorithm solves the problems in a Bayes-optimal way, i.e. no other algorithm can reach a better performance. For this, we leverage the so-called Nishimori conditions that must be satisfied by a Bayes-optimal algorithm. We also probe for phase transitions by considering the convergence time and by initializing the algorithm in both a random and an informed way and comparing the resulting fixed points. We present the corresponding phase diagrams. We find large regions of parameters where even for moderate system sizes the BP algorithm converges and satisfies closely the Nishimori conditions, and the problem is thus conjectured to be solved optimally in those regions. In other limited areas of the space of parameters, the Nishimori conditions are no longer satisfied and the BP algorithm struggles to converge. No sign of a phase transition is detected, however, and we attribute this failure of optimality to finite-size effects. The article is accompanied by a Python implementation of the algorithm that is easy to use or adapt.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Backtracking Dynamical Cavity Method
Authors:
Freya Behrens,
Barbora Hudcová,
Lenka Zdeborová
Abstract:
The cavity method is one of the cornerstones of the statistical physics of disordered systems such as spin glasses and other complex systems. It is able to analytically and asymptotically exactly describe the equilibrium properties of a broad range of models. Exact solutions for dynamical, out-of-equilibrium properties of disordered systems are traditionally much harder to obtain. Even very basic…
▽ More
The cavity method is one of the cornerstones of the statistical physics of disordered systems such as spin glasses and other complex systems. It is able to analytically and asymptotically exactly describe the equilibrium properties of a broad range of models. Exact solutions for dynamical, out-of-equilibrium properties of disordered systems are traditionally much harder to obtain. Even very basic questions such as the limiting energy of a fast quench are so far open. The dynamical cavity method partly fills this gap by considering short trajectories and leveraging the static cavity method. However, being limited to a couple of steps forward from the initialization it typically does not capture dynamical properties related to attractors of the dynamics. We introduce the backtracking dynamical cavity method that instead of analysing the trajectory forward from initialization, analyses trajectories that are found by tracking them backward from attractors. We illustrate that this rather elementary twist on the dynamical cavity method leads to new insight into some of the very basic questions about the dynamics of complex disordered systems. This method is as versatile as the cavity method itself and we hence anticipate that our paper will open many avenues for future research of dynamical, out-of-equilibrium, properties in complex systems.
△ Less
Submitted 8 September, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Neural-prior stochastic block model
Authors:
O. Duranthon,
L. Zdeborová
Abstract:
The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works i…
▽ More
The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; we call it the neural-prior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We analyze the performance of the algorithm as well as the Bayes-optimal performance. We identify detectability and exact recovery phase transitions, as well as an algorithmically hard region. The proposed model and algorithm can be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of simple graph neural networks.
△ Less
Submitted 6 September, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Statistical mechanics of the maximum-average submatrix problem
Authors:
Vittorio Erba,
Florent Krzakala,
Rodrigo Pérez,
Lenka Zdeborová
Abstract:
We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by mapping it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase dia…
▽ More
We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by mapping it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase diagram of the model as a function of the submatrix average and the size of the submatrix $k$ in the limit $N\to\infty$. We consider submatrices of size $k = m N$ with $0 < m < 1$. We find a rich phase diagram, including dynamical, static one-step replica symmetry breaking and full-step replica symmetry breaking. In the limit of $m \to 0$, we find a simpler phase diagram featuring a frozen 1-RSB phase, where the Gibbs measure is composed of exponentially many pure states each with zero entropy. We discover an interesting phenomenon, reminiscent of the phenomenology of the binary perceptron: there exist efficient algorithms that provably work in the frozen 1-RSB phase.
△ Less
Submitted 21 September, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Bayes-optimal Learning of Deep Random Networks of Extensive-width
Authors:
Hugo Cui,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We furt…
▽ More
We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We further compute closed-form expressions for the test errors of ridge regression, kernel and random features regression. We find, in particular, that optimally regularized ridge regression, as well as kernel regression, achieve Bayes-optimal performances, while the logistic loss yields a near-optimal test error for classification. We further show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
△ Less
Submitted 21 June, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
Disordered Systems Insights on Computational Hardness
Authors:
David Gamarnik,
Cristopher Moore,
Lenka Zdeborová
Abstract:
In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as t…
▽ More
In this review article, we discuss connections between the physics of disordered systems, phase transitions in inference problems, and computational hardness. We introduce two models representing the behavior of glassy systems, the spiked tensor model and the generalized linear model. We discuss the random (non-planted) versions of these problems as prototypical optimization problems, as well as the planted versions (with a hidden solution) as prototypical problems in statistical inference and learning. Based on ideas from physics, many of these problems have transitions where they are believed to jump from easy (solvable in polynomial time) to hard (requiring exponential time). We discuss several emerging ideas in theoretical computer science and statistics that provide rigorous evidence for hardness by proving that large classes of algorithms fail in the conjectured hard regime. This includes the overlap gap property, a particular mathematization of clustering or dynamical symmetry-breaking, which can be used to show that many algorithms that are local or robust to changes in their input fail. We also discuss the sum-of-squares hierarchy, which places bounds on proofs or algorithms that use low-degree polynomials such as standard spectral methods and semidefinite relaxations, including the Sherrington-Kirkpatrick model. Throughout the manuscript, we present connections to the physics of disordered systems and associated replica symmetry breaking properties.
△ Less
Submitted 18 October, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Planted matching problems on random hypergraphs
Authors:
Urte Adomaityte,
Anshul Toshniwal,
Gabriele Sicuro,
Lenka Zdeborová
Abstract:
We consider the problem of inferring a matching hidden in a weighted random $k$-hypergraph. We assume that the hyperedges' weights are random and distributed according to two different densities conditioning on the fact that they belong to the hidden matching, or not. We show that, for $k>2$ and in the large graph size limit, an algorithmic first order transition in the signal strength separates a…
▽ More
We consider the problem of inferring a matching hidden in a weighted random $k$-hypergraph. We assume that the hyperedges' weights are random and distributed according to two different densities conditioning on the fact that they belong to the hidden matching, or not. We show that, for $k>2$ and in the large graph size limit, an algorithmic first order transition in the signal strength separates a regime in which a complete recovery of the hidden matching is feasible from a regime in which partial recovery is possible. This is in contrast to the $k=2$ case where the transition is known to be continuous. Finally, we consider the case of graphs presenting a mixture of edges and $3$-hyperedges, interpolating between the $k=2$ and the $k=3$ cases, and we study how the transition changes from continuous to first order by tuning the relative amount of edges and hyperedges.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
The planted XY model: thermodynamics and inference
Authors:
Siyu Chen,
Guanhao Huang,
Giovanni Piccioli,
Lenka Zdeborová
Abstract:
In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm…
▽ More
In this paper we study a fully connected planted spin glass named the planted XY model. Motivation for studying this system comes both from the spin glass field and the one of statistical inference where it models the angular synchronization problem. We derive the replica symmetric (RS) phase diagram in the temperature, ferromagnetic bias plane using the approximate message passing (AMP) algorithm and its state evolution (SE). While the RS predictions are exact on the Nishimori line (i.e. when the temperature is matched to the ferromagnetic bias), they become inaccurate when the parameters are mismatched, giving rise to a spin glass phase where AMP is not able to converge. To overcome the defects of the RS approximation we carry out a one-step replica symmetry breaking (1RSB) analysis based on the approximate survey propagation (ASP) algorithm. Exploiting the state evolution of ASP, we count the number of metastable states in the measure, derive the 1RSB free entropy and find the behavior of the Parisi parameter throughout the spin glass phase.
△ Less
Submitted 11 January, 2024; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Low-rank Matrix Estimation with Inhomogeneous Noise
Authors:
Alice Guionnet,
Justin Ko,
Florent Krzakala,
Lenka Zdeborová
Abstract:
We study low-rank matrix estimation for a generic inhomogeneous output channel through which the matrix is observed. This generalizes the commonly considered spiked matrix model with homogeneous noise to include for instance the dense degree-corrected stochastic block model. We adapt techniques used to study multispecies spin glasses to derive and rigorously prove an expression for the free energy…
▽ More
We study low-rank matrix estimation for a generic inhomogeneous output channel through which the matrix is observed. This generalizes the commonly considered spiked matrix model with homogeneous noise to include for instance the dense degree-corrected stochastic block model. We adapt techniques used to study multispecies spin glasses to derive and rigorously prove an expression for the free energy of the problem in the large size limit, providing a framework to study the signal detection thresholds. We discuss an application of this framework to the degree corrected stochastic block models.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap
Authors:
Luca Pesce,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r…
▽ More
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the ratio $α$ between the number of samples and the dimension are fixed, while the dimension diverges. We identify the information-theoretic threshold below which obtaining a positive correlation with the true cluster means is statistically impossible. Additionally, we investigate the performance of the approximate message passing (AMP) algorithm analyzed via its state evolution, which is conjectured to be optimal among polynomial algorithm for this task. We identify in particular the existence of a statistical-to-computational gap between the algorithm that require a signal-to-noise ratio $λ_{\text{alg}} \ge k / \sqrtα $ to perform better than random, and the information theoretic threshold at $λ_{\text{it}} \approx \sqrt{-k ρ\logρ} / \sqrtα$. Finally, we discuss the case of sub-extensive sparsity $ρ$ by comparing the performance of the AMP with other sparsity-enhancing algorithms, such as sparse-PCA and diagonal thresholding.
△ Less
Submitted 1 December, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Gaussian Universality of Perceptrons with Random Labels
Authors:
Federica Gerace,
Florent Krzakala,
Bruno Loureiro,
Ludovic Stephan,
Lenka Zdeborová
Abstract:
While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t…
▽ More
While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance. In the limit of vanishing regularization, we further demonstrate that the training loss is independent of the data covariance. On the theoretical side, we prove this universality for an arbitrary mixture of homogeneous Gaussian clouds. Empirically, we show that the universality holds also for a broad range of real datasets.
△ Less
Submitted 2 March, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Learning curves for the multi-class teacher-student perceptron
Authors:
Elisabetta Cornacchia,
Francesca Mignacco,
Rodrigo Veiga,
Cédric Gerbelot,
Bruno Loureiro,
Lenka Zdeborová
Abstract:
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machin…
▽ More
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Optimal denoising of rotationally invariant rectangular matrices
Authors:
Emanuele Troiani,
Vittorio Erba,
Florent Krzakala,
Antoine Maillard,
Lenka Zdeborová
Abstract:
In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with…
▽ More
In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with fixed aspects ratio, and under the Bayes optimal setting, that is when the statistician knows how the signal and the observations were generated. Our results generalise previous works that considered only symmetric matrices to the more general case of non-symmetric and rectangular ones. We explore analytically and numerically a particular choice of factorized signal prior that models cross-covariance matrices and the matrix factorization problem. As a byproduct of our analysis, we provide an explicit asymptotic evaluation of the rectangular Harish-Chandra-Itzykson-Zuber integral in a special case.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
(Dis)assortative Partitions on Random Regular Graphs
Authors:
Freya Behrens,
Gabriel Arpino,
Yaroslav Kivva,
Lenka Zdeborová
Abstract:
We study the problem of assortative and disassortative partitions on random $d$-regular graphs. Nodes in the graph are partitioned into two non-empty groups. In the assortative partition every node requires at least $H$ of their neighbors to be in their own group. In the disassortative partition they require less than $H$ neighbors to be in their own group. Using the cavity method based on analysi…
▽ More
We study the problem of assortative and disassortative partitions on random $d$-regular graphs. Nodes in the graph are partitioned into two non-empty groups. In the assortative partition every node requires at least $H$ of their neighbors to be in their own group. In the disassortative partition they require less than $H$ neighbors to be in their own group. Using the cavity method based on analysis of the Belief Propagation algorithm we establish for which combinations of parameters $(d,H)$ these partitions exist with high probability and for which they do not. For $H>\lceil \frac{d}{2} \rceil $ we establish that the structure of solutions to the assortative partition problems corresponds to the so-called frozen-1RSB. This entails a conjecture of algorithmic hardness of finding these partitions efficiently. For $H \le \lceil \frac{d}{2} \rceil $ we argue that the assortative partition problem is algorithmically easy on average for all $d$. Further we provide arguments about asymptotic equivalence between the assortative partition problem and the disassortative one, going trough a close relation to the problem of single-spin-flip-stable states in spin glasses. In the context of spin glasses, our results on algorithmic hardness imply a conjecture that gapped single spin flip stable states are hard to find which may be a universal reason behind the observation that physical dynamics in glassy systems display convergence to marginal stability.
△ Less
Submitted 2 May, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Theoretical characterization of uncertainty in high-dimensional linear classification
Authors:
Lucas Clarté,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampl…
▽ More
Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampling is computationally challenging in high-dimensional problems and theoretical results on heuristic uncertainty estimators in high-dimensions are thus scarce. In this manuscript, we characterise uncertainty for learning from limited number of samples of high-dimensional Gaussian input data and labels generated by the probit model. In this setting, the Bayesian uncertainty (i.e. the posterior marginals) can be asymptotically obtained by the approximate message passing algorithm, bypassing the canonical but costly Monte Carlo sampling of the posterior. We then provide a closed-form formula for the joint statistics between the logistic classifier, the uncertainty of the statistically optimal Bayesian classifier and the ground-truth probit uncertainty. The formula allows us to investigate calibration of the logistic classifier learning from limited amount of samples. We discuss how over-confidence can be mitigated by appropriately regularising.
△ Less
Submitted 14 November, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
Authors:
Rodrigo Veiga,
Ludovic Stephan,
Bruno Loureiro,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect…
▽ More
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
△ Less
Submitted 14 June, 2023; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Aligning random graphs with a sub-tree similarity message-passing algorithm
Authors:
Giovanni Piccioli,
Guilhem Semerjian,
Gabriele Sicuro,
Lenka Zdeborová
Abstract:
The problem of aligning Erdös-Rényi random graphs is a noisy, average-case version of the graph isomorphism problem, in which a pair of correlated random graphs is observed through a random permutation of their vertices. We study a polynomial time message-passing algorithm devised to solve the inference problem of partially recovering the hidden permutation, in the sparse regime with constant aver…
▽ More
The problem of aligning Erdös-Rényi random graphs is a noisy, average-case version of the graph isomorphism problem, in which a pair of correlated random graphs is observed through a random permutation of their vertices. We study a polynomial time message-passing algorithm devised to solve the inference problem of partially recovering the hidden permutation, in the sparse regime with constant average degrees. We perform extensive numerical simulations to determine the range of parameters in which this algorithm achieves partial recovery. We also introduce a generalized ensemble of correlated random graphs with prescribed degree distributions, and extend the algorithm to this case.
△ Less
Submitted 4 May, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising
Authors:
Antoine Maillard,
Florent Krzakala,
Marc Mézard,
Lenka Zdeborová
Abstract:
Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise f…
▽ More
Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise function of their matrix product. In the limit where the dimensions of the matrices tend to infinity, but their ratios remain fixed, we expect to be able to derive closed form expressions for the optimal mean squared error on the estimation of the two factors. However, this remains a very involved mathematical and algorithmic problem. A related, but simpler, problem is extensive-rank matrix denoising, where one aims to reconstruct a matrix with extensive but usually small rank from noisy measurements. In this paper, we approach both these problems using high-temperature expansions at fixed order parameters. This allows to clarify how previous attempts at solving these problems failed at finding an asymptotically exact solution. We provide a systematic way to derive the corrections to these existing approximations, taking into account the structure of correlations particular to the problem. Finally, we illustrate our approach in detail on the case of extensive-rank matrix denoising. We compare our results with known optimal rotationally-invariant estimators, and show how exact asymptotic calculations of the minimal error can be performed using extensive-rank matrix integrals.
△ Less
Submitted 8 June, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
Large Deviations of Semi-supervised Learning in the Stochastic Block Model
Authors:
Hugo Cui,
Luca Saglietti,
Lenka Zdeborová
Abstract:
In community detection on graphs, the semi-supervised learning problem entails inferring the ground-truth membership of each node in a graph, given the connectivity structure and a limited number of revealed node labels. Different subsets of revealed labels can in principle lead to higher or lower information gains and induce different reconstruction accuracies. In the framework of the dense stoch…
▽ More
In community detection on graphs, the semi-supervised learning problem entails inferring the ground-truth membership of each node in a graph, given the connectivity structure and a limited number of revealed node labels. Different subsets of revealed labels can in principle lead to higher or lower information gains and induce different reconstruction accuracies. In the framework of the dense stochastic block model, we employ statistical physics methods to derive a large deviation analysis for this problem, in the high-dimensional limit. This analysis allows the characterization of the fluctuations around the typical behaviour, capturing the effect of correlated label choices and yielding an estimate of their informativeness and their rareness among subsets of the same size. We find theoretical evidence of a non-monotonic relationship between reconstruction accuracy and the free energy associated to the posterior measure of the inference problem. We further discuss possible implications for active learning applications in community detection.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Probing transfer learning with a model of synthetic correlated datasets
Authors:
Federica Gerace,
Luca Saglietti,
Stefano Sarao Mannelli,
Andrew Saxe,
Lenka Zdeborová
Abstract:
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable…
▽ More
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. This setup allows for an analytic characterization of the generalization performance obtained when transferring the learned feature map from the source to the target task. Focusing on the problem of training two-layer networks in a binary classification setting, we show that our model can capture a range of salient features of transfer learning with real data. Moreover, by exploiting parametric control over the correlation between the two data-sets, we systematically investigate under which conditions the transfer of features is beneficial for generalization.
△ Less
Submitted 2 February, 2022; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions
Authors:
Bruno Loureiro,
Gabriele Sicuro,
Cédric Gerbelot,
Alessandro Pacco,
Florent Krzakala,
Lenka Zdeborová
Abstract:
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM…
▽ More
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of $\ell_1$ penalty with respect to $\ell_2$; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for $K>2$. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.
△ Less
Submitted 14 December, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.