Skip to main content

Showing 1–15 of 15 results for author: Zavatone-Veth, J A

Searching in archive cond-mat. Search in all archives.
.
  1. arXiv:2503.18754  [pdf, other

    q-bio.NC cond-mat.dis-nn stat.ML

    Dynamically Learning to Integrate in Recurrent Neural Networks

    Authors: Blake Bordelon, Jordan Cotler, Cengiz Pehlevan, Jacob A. Zavatone-Veth

    Abstract: Learning to remember over long timescales is fundamentally challenging for recurrent neural networks (RNNs). While much prior work has explored why RNNs struggle to learn long timescales and how to mitigate this, we still lack a clear understanding of the dynamics involved when RNNs learn long timescales via gradient descent. Here we build a mathematical theory of the learning dynamics of linear R… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  2. arXiv:2502.05074  [pdf, ps, other

    cond-mat.dis-nn cs.LG stat.ML

    Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models

    Authors: Alexander Atanasov, Blake Bordelon, Jacob A. Zavatone-Veth, Courtney Paquette, Cengiz Pehlevan

    Abstract: We derive a novel deterministic equivalence for the two-point function of a random matrix resolvent. Using this result, we give a unified derivation of the performance of a wide variety of high-dimensional linear models trained with stochastic gradient descent. This includes high-dimensional linear regression, kernel regression, and random feature models. Our results include previously known asymp… ▽ More

    Submitted 29 April, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: Fixing typo in equation 2 (model definition)

  3. arXiv:2408.04607  [pdf, ps, other

    stat.ML cond-mat.dis-nn cs.LG

    Risk and cross validation in ridge regression with correlated samples

    Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that… ▽ More

    Submitted 31 May, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 44 pages, 19 figures. v4: ICML 2025 camera-ready

  4. arXiv:2408.03769  [pdf, other

    cond-mat.dis-nn stat.ML

    Nadaraya-Watson kernel smoothing as a random energy model

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Precise asymptotics have revealed many surprises in high-dimensional regression. These advances, however, have not extended to perhaps the simplest estimator: direct Nadaraya-Watson (NW) kernel smoothing. Here, we describe how one can use ideas from the analysis of the random energy model (REM) in statistical physics to compute sharp asymptotics for the NW estimator when the sample size is exponen… ▽ More

    Submitted 21 November, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

    Journal ref: Journal of Statistical Mechanics: Theory and Experiment (2025) 013404

  5. arXiv:2405.11751  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic theory of in-context learning by linear attention

    Authors: Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan

    Abstract: Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unr… ▽ More

    Submitted 4 February, 2025; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: 17 pages (main doc), 6 figures, and supplementary information (23 pages)

  6. arXiv:2405.00592  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Scaling and renormalization in high-dimensional regression

    Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generaliza… ▽ More

    Submitted 26 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 68 pages, 17 figures

  7. arXiv:2306.04532  [pdf, other

    cs.NE cond-mat.dis-nn cs.LG q-bio.NC stat.ML

    Long Sequence Hopfield Memory

    Authors: Hamza Tahir Chaudhry, Jacob A. Zavatone-Veth, Dmitry Krotov, Cengiz Pehlevan

    Abstract: Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maxi… ▽ More

    Submitted 2 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Camera-Ready, 41 pages

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  8. arXiv:2303.00564  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for deep structured Gaussian feature models

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In recent years, significant attention in deep learning theory has been devoted to analyzing when models that interpolate their training data can still generalize well to unseen examples. Many insights have been gained from studying models with multiple layers of Gaussian random features, for which one can compute precise generalization asymptotics. However, few works have considered the effect of… ▽ More

    Submitted 23 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: 14+18 pages, 2+1 figures. NeurIPS 2023 Camera Ready

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  9. arXiv:2301.11375  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Neural networks learn to magnify areas near decision boundaries

    Authors: Jacob A. Zavatone-Veth, Sheng Yang, Julian A. Rubinfien, Cengiz Pehlevan

    Abstract: In machine learning, there is a long history of trying to build neural networks that can learn from fewer example data by baking in strong geometric priors. However, it is not always clear a priori what geometric constraints are appropriate for a given task. Here, we consider the possibility that one can uncover useful geometric inductive biases by studying how training molds the Riemannian geomet… ▽ More

    Submitted 14 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 93 pages, 48 figures

  10. arXiv:2209.10499  [pdf, other

    cond-mat.dis-nn math.PR

    Replica method for eigenvalues of real Wishart product matrices

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: We show how the replica method can be used to compute the asymptotic eigenvalue spectrum of a real Wishart product matrix. For unstructured factors, this provides a compact, elementary derivation of a polynomial condition on the Stieltjes transform first proved by Müller [IEEE Trans. Inf. Theory. 48, 2086-2091 (2002)]. We then show how this computation can be extended to ensembles where the factor… ▽ More

    Submitted 20 January, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 50 pages, 5 figures

  11. arXiv:2203.00573  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Contrasting random and learned features in deep Bayesian linear regression

    Authors: Jacob A. Zavatone-Veth, William L. Tong, Cengiz Pehlevan

    Abstract: Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are t… ▽ More

    Submitted 16 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 35 pages, 7 figures. v2: minor typos corrected and references added; published in PRE

    Journal ref: Physical Review E 105, 064118 (2022)

  12. arXiv:2201.04669  [pdf, ps, other

    cond-mat.dis-nn cs.LG

    On neural network kernels and the storage capacity problem

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In this short note, we reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly-growing body of literature on kernel limits of wide neural networks. Concretely, we observe that the "effective order parameter" studied in the statistical mechanics literature is exactly equivalent to the infinite-width Neural Network Gaussian Process… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 5 pages, no figures

    Journal ref: Neural Computation (2022) 34 (5): 1136-1142

  13. arXiv:2106.00651  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Asymptotics of representation learning in finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Abdulkadir Canatar, Benjamin S. Ruben, Cengiz Pehlevan

    Abstract: Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width c… ▽ More

    Submitted 8 February, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplement; v5: post-NeurIPS update, minor typos fixed

    Journal ref: Advances in Neural Information Processing Systems 34 (2021); JSTAT 114008 (2022)

  14. arXiv:2104.11734  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Exact marginal prior distributions of finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we deriv… ▽ More

    Submitted 18 October, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 Spotlight

    Journal ref: Advances in Neural Information Processing Systems 34 (2021)

  15. arXiv:2007.11136  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Activation function dependence of the storage capacity of treelike neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage ca… ▽ More

    Submitted 4 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: 5+23 pages, 2+4 figures. v3: accepted for publication as a Letter in Physical Review E

    Journal ref: Phys. Rev. E 103, 020301 (2021)