Skip to main content

Showing 1–50 of 161 results for author: Krzakala, F

Searching in archive cond-mat. Search in all archives.
.
  1. arXiv:2506.02664  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model

    Authors: Hugo Tabanelli, Pierre Mergny, Lenka Zdeborova, Florent Krzakala

    Abstract: We study the recovery of multiple high-dimensional signals from two noisy, correlated modalities: a spiked matrix and a spiked tensor sharing a common low-rank structure. This setting generalizes classical spiked matrix and tensor models, unveiling intricate interactions between inference channels and surprising algorithmic behaviors. Notably, while the spiked tensor model is typically intractable… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  2. arXiv:2506.02651  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks

    Authors: Luca Arnaboldi, Bruno Loureiro, Ludovic Stephan, Florent Krzakala, Lenka Zdeborova

    Abstract: We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expressi… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2505.18046  [pdf, ps, other

    cs.LG cond-mat.dis-nn stat.ML

    Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

    Authors: Yizhou Xu, Florent Krzakala, Lenka Zdeborová

    Abstract: The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a con… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  4. arXiv:2505.17958  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG

    The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks

    Authors: Vittorio Erba, Emanuele Troiani, Lenka Zdeborová, Florent Krzakala

    Abstract: We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $\ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in suc… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  5. arXiv:2503.14121  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG math.PR

    Fundamental Limits of Matrix Sensing: Exact Asymptotics, Universality, and Applications

    Authors: Yizhou Xu, Antoine Maillard, Lenka Zdeborová, Florent Krzakala

    Abstract: In the matrix sensing problem, one wishes to reconstruct a matrix from (possibly noisy) observations of its linear projections along given directions. We consider this model in the high-dimensional limit: while previous works on this model primarily focused on the recovery of low-rank matrices, we consider in this work more general classes of structured signal matrices with potentially large rank,… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  6. arXiv:2502.02545  [pdf, ps, other

    cs.LG cond-mat.dis-nn

    Optimal Spectral Transitions in High-Dimensional Multi-Index Models

    Authors: Leonardo Defilippis, Yatin Dandi, Pierre Mergny, Florent Krzakala, Bruno Loureiro

    Abstract: We consider the problem of how many samples from a Gaussian multi-index model are required to weakly reconstruct the relevant index subspace. Despite its increasing popularity as a testbed for investigating the computational complexity of neural networks, results beyond the single-index setting remain elusive. In this work, we introduce spectral algorithms based on the linearization of a message p… ▽ More

    Submitted 10 June, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  7. arXiv:2502.00901  [pdf, other

    cs.LG cond-mat.dis-nn

    Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds

    Authors: Emanuele Troiani, Hugo Cui, Yatin Dandi, Florent Krzakala, Lenka Zdeborová

    Abstract: In this manuscript, we study the learning of deep attention neural networks, defined as the composition of multiple self-attention layers, with tied and low-rank weights. We first establish a mapping of such models to sequence multi-index models, a generalization of the widely studied multi-index model to sequential covariates, for which we establish a number of general results. In the context of… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

  8. arXiv:2409.12965  [pdf, other

    cs.ET cond-mat.dis-nn cs.LG physics.app-ph physics.optics

    Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment

    Authors: Ziao Wang, Kilian Müller, Matthew Filipovich, Julien Launay, Ruben Ohana, Gustave Pariente, Safa Mokaadi, Charles Brossollet, Fabien Moreau, Alessandro Cappelli, Iacopo Poli, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan

    Abstract: Modern deep learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a… ▽ More

    Submitted 2 April, 2025; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: 20 pages, 4 figures; Additional experiments conducted;

  9. arXiv:2408.08319  [pdf, other

    cs.IT cond-mat.dis-nn

    The phase diagram of compressed sensing with $\ell_0$-norm regularization

    Authors: Damien Barbier, Carlo Lucibello, Luca Saglietti, Florent Krzakala, Lenka Zdeborová

    Abstract: Noiseless compressive sensing is a two-steps setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on $\lone$ regularization, provides an efficient and robust to address this problem, but it fails in the regime of very high compression rate. Here we present two algorithms based on $\lzero$-norm regularization instea… ▽ More

    Submitted 22 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2304.12127

  10. arXiv:2408.03733  [pdf, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG math.PR

    Bayes-optimal learning of an extensive-width neural network from quadratically many samples

    Authors: Antoine Maillard, Emanuele Troiani, Simon Martin, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension and the network width are proportionally large. Recent work [Cui & al '23] established that linear regression provides Bayes-optimal test error to learn… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 47 pages

    Journal ref: Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

  11. arXiv:2405.15480  [pdf, other

    cs.LG cond-mat.dis-nn cs.CC

    Fundamental computational limits of weak learnability in high-dimensional multi-index models

    Authors: Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

    Abstract: Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical boundaries of efficient learnability in this hypothesis class, focusing on the minimum sample complexity required for weakly recovering their low-dim… ▽ More

    Submitted 2 April, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  12. Quenches in the Sherrington-Kirkpatrick model

    Authors: Vittorio Erba, Freya Behrens, Florent Krzakala, Lenka Zdeborová

    Abstract: The Sherrington-Kirkpatrick (SK) model is a prototype of a complex non-convex energy landscape. Dynamical processes evolving on such landscapes and locally aiming to reach minima are generally poorly understood. Here, we study quenches, i.e. dynamics that locally aim to decrease energy. We analyse the energy at convergence for two distinct algorithmic classes, single-spin flip and synchronous dyna… ▽ More

    Submitted 17 July, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Journal ref: J. Stat. Mech. (2024) 083302

  13. arXiv:2403.03695  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Spectral Phase Transition and Optimal PCA in Block-Structured Spiked models

    Authors: Pierre Mergny, Justin Ko, Florent Krzakala

    Abstract: We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios, through the prism of random matrix theory, with a specific focus on its spectral properties. Our primary objective is to find an optimal spectral method and to extend the celebrated \cite{BBP} (BBP) phase transition criterion -- well-known in the ho… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 26 pages, 2 figures

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35470-35491, 2024

  14. arXiv:2402.13622  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

    Authors: Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, ta… ▽ More

    Submitted 1 November, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:787-819, 2024

  15. arXiv:2402.05674  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

    Authors: Kasimir Tanner, Matteo Vilucchio, Bruno Loureiro, Florent Krzakala

    Abstract: This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $α= n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observ… ▽ More

    Submitted 27 December, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  16. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9662-9695, 2024

  17. arXiv:2310.02850  [pdf, other

    math.PR cond-mat.dis-nn

    On the Atypical Solutions of the Symmetric Binary Perceptron

    Authors: Damien Barbier, Ahmed El Alaoui, Florent Krzakala, Lenka Zdeborová

    Abstract: We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz… ▽ More

    Submitted 28 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 26 pages, 6 figures

    Journal ref: Journal of Physics A: Mathematical and Theoretical 57.19 (2024): 195202

  18. arXiv:2308.14085  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG

    Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective

    Authors: Davide Ghio, Yatin Dandi, Florent Krzakala, Lenka Zdeborová

    Abstract: Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical analysis of the performance and understanding of the limitations of these methods remain, however, challenging. In this paper, we undertake a step in… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: 39 pages, 12 figures

    Journal ref: Proceedings of the National Academy of Sciences 121.27 (2024): e2311810121

  19. arXiv:2304.12127  [pdf, other

    cs.IT cond-mat.dis-nn cond-mat.stat-mech

    Compressed sensing with l0-norm: statistical physics analysis and algorithms for signal recovery

    Authors: D. Barbier, C Lucibello, L. Saglietti, F. Krzakala, L. Zdeborova

    Abstract: Noiseless compressive sensing is a protocol that enables undersampling and later recovery of a signal without loss of information. This compression is possible because the signal is usually sufficiently sparse in a given basis. Currently, the algorithm offering the best tradeoff between compression rate, robustness, and speed for compressive sensing is the LASSO (l1-norm bias) algorithm. However,… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of IEEE Information Theory Workshop (ITW), pp. 323-328. IEEE, 2023

  20. arXiv:2303.05237  [pdf, other

    cond-mat.dis-nn cs.IT

    Statistical mechanics of the maximum-average submatrix problem

    Authors: Vittorio Erba, Florent Krzakala, Rodrigo Pérez, Lenka Zdeborová

    Abstract: We study the maximum-average submatrix problem, in which given an $N \times N$ matrix $J$ one needs to find the $k \times k$ submatrix with the largest average of entries. We study the problem for random matrices $J$ whose entries are i.i.d. random variables by mapping it to a variant of the Sherrington-Kirkpatrick spin-glass model at fixed magnetization. We characterize analytically the phase dia… ▽ More

    Submitted 21 September, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Journal ref: J. Stat. Mech. (2024) 013403

  21. arXiv:2302.08923  [pdf, other

    math.ST cond-mat.dis-nn cs.LG stat.ML

    Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

    Authors: Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we a… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  22. arXiv:2302.06665  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR

    Optimal Algorithms for the Inhomogeneous Spiked Wigner Model

    Authors: Aleksandr Pak, Justin Ko, Florent Krzakala

    Abstract: In this paper, we study a spiked Wigner problem with an inhomogeneous noise profile. Our aim in this problem is to recover the signal passed through an inhomogeneous low-rank matrix channel. While the information-theoretic performances are well-known, we focus on the algorithmic problem. We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem and show that its rigoro… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 17 pages, 5 figures

  23. arXiv:2302.05882  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

    Authors: Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro

    Abstract: This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying an… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

  24. arXiv:2302.00375  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Bayes-optimal Learning of Deep Random Networks of Extensive-width

    Authors: Hugo Cui, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We furt… ▽ More

    Submitted 21 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:6468-6521, 2023

  25. arXiv:2208.05918  [pdf, other

    math.PR cond-mat.dis-nn math.ST

    Low-rank Matrix Estimation with Inhomogeneous Noise

    Authors: Alice Guionnet, Justin Ko, Florent Krzakala, Lenka Zdeborová

    Abstract: We study low-rank matrix estimation for a generic inhomogeneous output channel through which the matrix is observed. This generalizes the commonly considered spiked matrix model with homogeneous noise to include for instance the dense degree-corrected stochastic block model. We adapt techniques used to study multispecies spin glasses to derive and rigorously prove an expression for the free energy… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

    Comments: 6 figures

    Journal ref: Information and Inference: A Journal of the IMA 14, no. 2 (2025): iaaf010

  26. arXiv:2205.13527  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

    Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: NeurIPS camera-ready version

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages 27087--27099

  27. arXiv:2205.13303  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Gaussian Universality of Perceptrons with Random Labels

    Authors: Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová

    Abstract: While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this study, we redeem this line of work in the case of generalized linear classification, a.k.a. the perceptron model, with random labels. We argue that t… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

    Journal ref: Physical Review E 109.3 (2024): 034305

  28. arXiv:2203.07752  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IT math.PR

    Optimal denoising of rotationally invariant rectangular matrices

    Authors: Emanuele Troiani, Vittorio Erba, Florent Krzakala, Antoine Maillard, Lenka Zdeborová

    Abstract: In this manuscript we consider denoising of large rectangular matrices: given a noisy observation of a signal matrix, what is the best way of recovering the signal matrix itself? For Gaussian noise and rotationally-invariant signal priors, we completely characterize the optimal denoiser and its performance in the high-dimensional limit, in which the size of the signal matrix goes to infinity with… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Journal ref: Proceedings of Mathematical and Scientific Machine Learning (MSML), PMLR 190:97-112, 2022

  29. arXiv:2202.03295  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Theoretical characterization of uncertainty in high-dimensional linear classification

    Authors: Lucas Clarté, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Being able to reliably assess not only the \emph{accuracy} but also the \emph{uncertainty} of models' predictions is an important endeavour in modern machine learning. Even if the model generating the data and labels is known, computing the intrinsic uncertainty after learning the model from a limited number of samples amounts to sampling the corresponding posterior probability measure. Such sampl… ▽ More

    Submitted 14 November, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Journal ref: Mach. Learn.: Sci. Technol. 4 025029 (2023)

  30. arXiv:2202.00293  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

    Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connect… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 20 pages

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages {23244--23255)

  31. arXiv:2201.13383  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension

    Authors: Bruno Loureiro, Cédric Gerbelot, Maria Refinetti, Gabriele Sicuro, Florent Krzakala

    Abstract: From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice. Understanding the statistical fluctuations engendered by the different sources of randomness in prediction is therefore key to understanding robust generalisation. In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble o… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

    Comments: 17 pages + Appendix

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML). PMLR 162:14283-14314, 2022

  32. arXiv:2110.08775  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.IT math.PR

    Perturbative construction of mean-field equations in extensive-rank matrix factorization and denoising

    Authors: Antoine Maillard, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Factorization of matrices where the rank of the two factors diverges linearly with their sizes has many applications in diverse areas such as unsupervised representation learning, dictionary learning or sparse coding. We consider a setting where the two factors are generated from known component-wise independent prior distributions, and the statistician observes a (possibly noisy) component-wise f… ▽ More

    Submitted 8 June, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 30 pages (main text), 25 pages of references and appendices. v2: Adding clarifications and a new result to derive the optimal denoising estimator from the asymptotic free energy. v3: corrections to match the published version

    Journal ref: J. Stat. Mech. (2022) 083301

  33. arXiv:2106.03791  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

    Authors: Bruno Loureiro, Gabriele Sicuro, Cédric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová

    Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM… ▽ More

    Submitted 14 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: 12 pages + 34 pages of Appendix, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 34 (2021): 10144-10157

  34. arXiv:2105.15004  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

    Authors: Hugo Cui, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization… ▽ More

    Submitted 15 December, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: 22 pages, 10 figures, 2 tables

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) vol 34 p10131--10143. J. Stat. Mech. (2022) 114004

  35. arXiv:2105.07416  [pdf, other

    q-bio.NC cond-mat.stat-mech stat.ML

    Bayesian reconstruction of memories stored in neural networks from their connectivity

    Authors: Sebastian Goldt, Florent Krzakala, Lenka Zdeborová, Nicolas Brunel

    Abstract: The advent of comprehensive synaptic wiring diagrams of large neural circuits has created the field of connectomics and given rise to a number of open research questions. One such question is whether it is possible to reconstruct the information stored in a recurrent network of neurons, given its synaptic connectivity matrix. Here, we address this question by determining when solving such an infer… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Code available at https://github.com/sgoldt/reconstructing_memories

    Journal ref: PLOS Computational Biology 19(1): e1010813 2023

  36. arXiv:2102.11742  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

    Authors: Maria Refinetti, Sebastian Goldt, Florent Krzakala, Lenka Zdeborová

    Abstract: A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: The accompanying code for this paper is available at https://github.com/mariaref/rfvs2lnn_GMM_online

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  37. arXiv:2102.08127  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Authors: Bruno Loureiro, Cédric Gerbelot, Hugo Cui, Sebastian Goldt, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalis… ▽ More

    Submitted 14 December, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: v3: NeurIPS camera-ready

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), vol 34 p10137--18151. J. Stat. Mech. (2022) 114001

  38. arXiv:2012.04524  [pdf, other

    cs.IT cond-mat.dis-nn

    Construction of optimal spectral methods in phase retrieval

    Authors: Antoine Maillard, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider the phase retrieval problem, in which the observer wishes to recover a $n$-dimensional real or complex signal $\mathbf{X}^\star$ from the (possibly noisy) observation of $|\mathbfΦ \mathbf{X}^\star|$, in which $\mathbfΦ$ is a matrix of size $m \times n$. We consider a \emph{high-dimensional} setting where $n,m \to \infty$ with $m/n = \mathcal{O}(1)$, and a large class of (possibly corr… ▽ More

    Submitted 14 October, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 14 pages + references and appendix. v2: Version updated to match the one accepted at MSML 2021. v3: Adding a reference to a previous work mentioning marginal stability and its connection to Bayes-optimality

    Journal ref: Proceedings of Machine Learning Research vol 145:1-28, 2021 2nd Annual Conference on Mathematical and Scientific Machine Learning (MSML 21)

  39. arXiv:2009.09422  [pdf, other

    q-bio.PE cond-mat.stat-mech cs.AI cs.LG

    Epidemic mitigation by statistical inference from contact tracing data

    Authors: Antoine Baker, Indaco Biazzo, Alfredo Braunstein, Giovanni Catania, Luca Dall'Asta, Alessandro Ingrosso, Florent Krzakala, Fabio Mazza, Marc Mézard, Anna Paola Muntoni, Maria Refinetti, Stefano Sarao Mannelli, Lenka Zdeborová

    Abstract: Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing th… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: 21 pages, 7 figures

    ACM Class: G.3; G.4; I.2.11; J.3

    Journal ref: PNAS 2021 Vol. 118 No. 32 e2106548118

  40. arXiv:2006.14709  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    The Gaussian equivalence of generative models for learning with shallow neural networks

    Authors: Sebastian Goldt, Bruno Loureiro, Galen Reeves, Florent Krzakala, Marc Mézard, Lenka Zdeborová

    Abstract: Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trai… ▽ More

    Submitted 21 May, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: The accompanying code for this paper is available at https://github.com/sgoldt/gaussian-equiv-2layer

    Journal ref: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, PMLR 145:426-471 (2021)

  41. arXiv:2006.06997  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimensio… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v22, page 3265--327, 2020

  42. arXiv:2006.06581  [pdf, other

    stat.ML cond-mat.dis-nn cs.IT cs.LG math.PR

    Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula)

    Authors: Cedric Gerbelot, Alia Abbara, Florent Krzakala

    Abstract: There has been a recent surge of interest in the study of asymptotic reconstruction performance in various cases of generalized linear estimation problems in the teacher-student setting, especially for the case of i.i.d standard normal matrices. Here, we go beyond these matrices, and prove an analytical formula for the reconstruction performance of convex generalized linear models with rotationall… ▽ More

    Submitted 10 November, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 53 pages, 4 figures

    Journal ref: IEEE Transactions on Information Theory, vol. 69, no. 3, pp. 1824-1852, March 2023

  43. arXiv:2006.06560  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

    Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

  44. arXiv:2006.06098  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

    Authors: Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD c… ▽ More

    Submitted 9 November, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + appendix, 4 figures

    Journal ref: J. Stat. Mech. 2021 124008 & NeurIPS 2020

  45. arXiv:2006.05228  [pdf, other

    math.ST cond-mat.dis-nn cs.IT cs.LG math.PR

    Phase retrieval in high dimensions: Statistical and computational phase transitions

    Authors: Antoine Maillard, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: We consider the phase retrieval problem of reconstructing a $n$-dimensional real or complex signal $\mathbf{X}^{\star}$ from $m$ (possibly noisy) observations $Y_μ= | \sum_{i=1}^n Φ_{μi} X^{\star}_i/\sqrt{n}|$, for a large class of correlated real and complex random sensing matrices $\mathbfΦ$, in a high-dimensional setting where $m,n\to\infty$ while $α= m/n=Θ(1)$. First, we derive sharp asymptoti… ▽ More

    Submitted 23 October, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 12 pages (main text and references), 26 pages of supplementary material. v2 matches the final version accepted at NeurIPS 2021

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 11071--11082, 2020

  46. arXiv:2004.01571  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG eess.SP math.ST stat.CO

    Tree-AMP: Compositional Inference with Tree Approximate Message Passing

    Authors: Antoine Baker, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factori… ▽ More

    Submitted 11 December, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: Source code available at https://github.com/sphinxteam/tramp and documentation at https://sphinxteam.github.io/tramp.docs

    Journal ref: Journal of Machine Learning Research 24 (2023) 1-89

  47. arXiv:2003.01054  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

    Authors: Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

    Abstract: Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime o… ▽ More

    Submitted 3 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 29 pages, 12 figures

  48. arXiv:2002.11544  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    The role of regularization in classification of high-dimensional noisy Gaussian mixture

    Authors: Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and th… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 8 pages + appendix, 6 figures

    Journal ref: International Conference on Machine Learning, ICML 2020

  49. arXiv:2002.04372  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech

    Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

    Authors: Cédric Gerbelot, Alia Abbara, Florent Krzakala

    Abstract: We consider the problem of learning a coefficient vector $x_{0}$ in $R^{N}$ from noisy linear observations $y=Fx_{0}+w$ in $R^{M}$ in the high dimensional limit $M,N$ to infinity with $α=M/N$ fixed. We provide a rigorous derivation of an explicit formula -- first conjectured using heuristic methods from statistical physics -- for the asymptotic mean squared error obtained by penalized convex regre… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 31 pages, 2 figures

  50. arXiv:1912.02729  [pdf, ps, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG stat.ML

    Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

    Authors: Alia Abbara, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

    Abstract: Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher… ▽ More

    Submitted 15 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: 15 + 10 pages, v2 revised and accepted at MSML

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:27-54, 2020