Skip to main content

Showing 1–12 of 12 results for author: Pesce, L

.
  1. arXiv:2502.13961  [pdf, ps, other

    stat.ML cs.LG

    The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

    Authors: Yatin Dandi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics a… ▽ More

    Submitted 11 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

  2. arXiv:2412.07379  [pdf, other

    physics.ins-det

    Germanium target sensed by phonon-mediated kinetic inductance detectors

    Authors: D. Delicato, D. Angelone, L. Bandiera, M. Calvo, M. Cappelli, U. Chowdhury, G. Del Castello, M. Folcarelli, M. del Gallo Roccagiovine, V. Guidi, G. L. Pesce, M. Romagnoni, A. Cruciani, A. Mazzolari, A. Monfardini, M. Vignati

    Abstract: Cryogenic phonon detectors are adopted in experiments searching for dark matter interactions or coherent elastic neutrino-nucleus scattering, thanks to the low energy threshold they can achieve. The phonon-mediated sensing of particle interactions in passive silicon absorbers has been demonstrated with Kinetic Inductance Detectors (KIDs). Targets with neutron number larger than silicon, however, f… ▽ More

    Submitted 15 April, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: 3 Figures, 2 Tables, 5 pages

  3. arXiv:2410.18938  [pdf, other

    stat.ML cs.LG math.ST

    A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities

    Authors: Yatin Dandi, Luca Pesce, Hugo Cui, Florent Krzakala, Yue M. Lu, Bruno Loureiro

    Abstract: A key property of neural networks is their capacity of adapting to data during training. Yet, our current mathematical understanding of feature learning and its relationship to generalization remain limited. In this work, we provide a random matrix analysis of how fully-connected two-layer neural networks adapt to the target function after a single, but aggressive, gradient descent step. We rigoro… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  4. arXiv:2406.02157  [pdf, other

    stat.ML cs.LG

    Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1730-1762, 2024

  5. arXiv:2405.15459  [pdf, other

    stat.ML cs.LG

    Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan

    Abstract: Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensi… ▽ More

    Submitted 10 February, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9662-9695, 2024

  7. arXiv:2402.03220  [pdf, other

    stat.ML cs.LG

    The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

    Authors: Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2024

    Journal ref: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9991-10016, 2024

  8. arXiv:2306.16255  [pdf, other

    math.OC cs.IT math.ST

    Theory and applications of the Sum-Of-Squares technique

    Authors: Francis Bach, Elisabetta Cornacchia, Luca Pesce, Giovanni Piccioli

    Abstract: The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We star… ▽ More

    Submitted 11 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: These are notes from the lecture of Francis Bach given at the summer school "Statistical Physics & Machine Learning", that took place in Les Houches School of Physics in France from 4th to 29th July 2022. The school was organized by Florent Krzakala and Lenka Zdeborová from EPFL. 19 pages, 4 figures

  9. arXiv:2305.18270  [pdf, other

    stat.ML cs.LG

    How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

    Authors: Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: For high-dimensional Gaussian data, we investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to an improvement in the approximation capacity from initialization. First, we compare the influence of batch size to that of multiple steps. For a single step, a batch of size… ▽ More

    Submitted 3 June, 2025; v1 submitted 29 May, 2023; originally announced May 2023.

    Journal ref: Journal of Machine Learning Research 25 (2004) 1-65

  10. arXiv:2302.08923  [pdf, other

    math.ST cond-mat.dis-nn cs.LG stat.ML

    Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

    Authors: Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

    Abstract: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we a… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  11. arXiv:2207.14622  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    Innate Dynamics and Identity Crisis of a Metal Surface Unveiled by Machine Learning of Atomic Environments

    Authors: Matteo Cioni, Daniela Polino, Daniele Rapetti, Luca Pesce, Massimo Delle Piane, Giovanni M. Pavan

    Abstract: Metals are traditionally considered hard matter. However, it is well known that their atomic lattices may become dynamic and undergo reconfigurations even well-below the melting temperature. The innate atomic dynamics of metals is directly related to their bulk and surface properties. Understanding their complex structural dynamics is thus important for many applications but is not easy. Here we r… ▽ More

    Submitted 21 February, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

  12. arXiv:2205.13527  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR math.ST

    Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

    Authors: Luca Pesce, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

    Abstract: A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $ρ$, as well as the r… ▽ More

    Submitted 1 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: NeurIPS camera-ready version

    Journal ref: Advances in Neural Information Processing Systems (2022), vol 35, pages 27087--27099