Skip to main content

Showing 1–12 of 12 results for author: Arous, G B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.14650  [pdf, other

    math.PR cs.LG stat.ML

    Permutation recovery of spikes in noisy high-dimensional tensor estimation

    Authors: Gérard Ben Arous, Cédric Gerbelot, Vanessa Piccolo

    Abstract: We study the dynamics of gradient flow in high dimensions for the multi-spiked tensor problem, where the goal is to estimate $r$ unknown signal vectors (spikes) from noisy Gaussian tensor observations. Specifically, we analyze the maximum likelihood estimation procedure, which involves optimizing a highly nonconvex random function. We determine the sample complexity required for gradient flow to e… ▽ More

    Submitted 20 December, 2024; v1 submitted 19 December, 2024; originally announced December 2024.

    Comments: 29 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2408.06401

    MSC Class: 68Q87; 62F30; 60H30

  2. arXiv:2410.18162  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Stochastic gradient descent in high dimensions for multi-spiked tensor PCA

    Authors: Gérard Ben Arous, Cédric Gerbelot, Vanessa Piccolo

    Abstract: We study the dynamics in high dimensions of online stochastic gradient descent for the multi-spiked tensor model. This multi-index model arises from the tensor principal component analysis (PCA) problem with multiple spikes, where the goal is to estimate $r$ unknown signal vectors within the $N$-dimensional unit sphere through maximum likelihood estimation from noisy observations of a $p$-tensor.… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: 58 pages, 10 figures. This is part of our manuscript arXiv:2408.06401

    MSC Class: 68Q87; 62F30; 60G42

  3. arXiv:2408.06401  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Langevin dynamics for high-dimensional optimization: the case of multi-spiked tensor PCA

    Authors: Gérard Ben Arous, Cédric Gerbelot, Vanessa Piccolo

    Abstract: We study nonconvex optimization in high dimensions through Langevin dynamics, focusing on the multi-spiked tensor PCA problem. This tensor estimation problem involves recovering $r$ hidden signal vectors (spikes) from noisy Gaussian tensor observations using maximum likelihood estimation. We study the number of samples required for Langevin dynamics to efficiently recover the spikes and determine… ▽ More

    Submitted 19 December, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 65 pages

    MSC Class: 68Q87; 62F30; 60G44; 60H30

  4. arXiv:2310.03010  [pdf, other

    cs.LG math.PR stat.ML

    Spectral alignment of stochastic gradient descent for high-dimensional classification tasks

    Authors: Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

    Abstract: We rigorously study the relation between the training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, both the SGD trajectory and emergent outlier eigenspaces of the Hessian and gradient matrices align with… ▽ More

    Submitted 15 May, 2025; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Final version. 53 pages, 12 figures

  5. arXiv:2206.04030  [pdf, other

    stat.ML cs.LG math.PR math.ST

    High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

    Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

    Abstract: We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ball… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: 43 pages, 11 figures

  6. arXiv:2110.10210  [pdf, other

    math.PR cs.LG stat.ML

    Long Random Matrices and Tensor Unfolding

    Authors: Gérard Ben Arous, Daniel Zhengyu Huang, Jiaoyang Huang

    Abstract: In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and sing… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 29 pages, 4 figures

  7. arXiv:2006.10689  [pdf, ps, other

    math.PR cs.DS cs.LG math.OC math.ST

    Free Energy Wells and Overlap Gap Property in Sparse PCA

    Authors: Gérard Ben Arous, Alexander S. Wein, Ilias Zadik

    Abstract: We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspir… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 63 pages. Accepted for presentation at the Conference on Learning Theory (COLT) 2020

  8. arXiv:2003.10409  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Online stochastic gradient descent on non-convex losses from high-dimensional inference

    Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

    Abstract: Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random… ▽ More

    Submitted 10 May, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: final version to appear at Jour. Mach. Learn. Res$.$

    Journal ref: J. Mach. Learn. Res., Vol 22, No.106,1-51(2021)

  9. arXiv:1912.02143  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR

    Landscape Complexity for the Empirical Risk of Generalized Linear Models

    Authors: Antoine Maillard, Gérard Ben Arous, Giulio Biroli

    Abstract: We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. Under a technical hypothesis… ▽ More

    Submitted 18 January, 2023; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: 18 pages and 18 pages appendix. Update to match the published version (v2). Corrections of remaining small typos (v3). Simplification of a technical argument in Appendix A (v4) and clarification of a technical hypothesis (v5)

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:287-327, 2020

  10. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

  11. arXiv:1412.6615  [pdf, other

    stat.ML cs.LG

    Explorations on high dimensional landscapes

    Authors: Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann LeCun

    Abstract: Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with… ▽ More

    Submitted 6 April, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 11 pages, 8 figures, workshop contribution at ICLR 2015

  12. arXiv:1412.0233  [pdf, other

    cs.LG

    The Loss Surfaces of Multilayer Networks

    Authors: Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun

    Abstract: We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network t… ▽ More

    Submitted 21 January, 2015; v1 submitted 30 November, 2014; originally announced December 2014.