Skip to main content

Showing 1–24 of 24 results for author: Hodgkinson, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03470  [pdf, ps, other

    stat.ML cs.LG

    Models of Heavy-Tailed Mechanistic Universality

    Authors: Liam Hodgkinson, Zhichao Wang, Michael W. Mahoney

    Abstract: Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians, Hessians, and weight matrices has led to the introduction of the concept of heavy-tai… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 40 pages, 4 figures

  2. arXiv:2503.04424  [pdf, other

    stat.ML cs.LG math.NA

    Determinant Estimation under Memory Constraints and Neural Scaling Laws

    Authors: Siavash Ameli, Chris van der Heide, Liam Hodgkinson, Fred Roosta, Michael W. Mahoney

    Abstract: Calculating or accurately estimating log-determinants of large positive semi-definite matrices is of fundamental importance in many machine learning tasks. While its cubic computational complexity can already be prohibitive, in modern applications, even storing the matrices themselves can pose a memory bottleneck. To address this, we derive a novel hierarchical algorithm based on block-wise comput… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  3. arXiv:2502.02870  [pdf, other

    stat.ML cs.LG

    Uncertainty Quantification with the Empirical Neural Tangent Kernel

    Authors: Joseph Wilson, Chris van der Heide, Liam Hodgkinson, Fred Roosta

    Abstract: While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems. Several Bayesian uncertainty quantification (UQ) methods exist that are either cheap or reliable, but not both. We propose a post-hoc, sampling-based UQ method fo… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 24 pages, 5 figures, 9 tables

  4. arXiv:2411.15239  [pdf, other

    cs.CV

    Preserving Angles Improves Feature Distillation of Foundation Models

    Authors: Evelyn J. Mannix, Liam Hodgkinson, Howard Bondell

    Abstract: Knowledge distillation approaches compress models by training a student network using the classification outputs of a high quality teacher model, but can fail to effectively transfer the properties of computer vision foundation models from the teacher to the student. While it has been recently shown that feature distillation$\unicode{x2013}$where a teacher model's output features are replicated in… ▽ More

    Submitted 7 March, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  5. arXiv:2411.00328  [pdf, other

    stat.ML cs.LG

    How many classifiers do we need?

    Authors: Hyunsuk Kim, Liam Hodgkinson, Ryan Theisen, Michael W. Mahoney

    Abstract: As performance gains through scaling data and/or model size experience diminishing returns, it is becoming increasingly popular to turn to ensembling, where the predictions of multiple models are combined to improve accuracy. In this paper, we provide a detailed analysis of how the disagreement and the polarization (a notion we introduce and define in this paper) among classifiers relate to the pe… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  6. arXiv:2410.05757  [pdf, ps, other

    stat.ML cs.LG stat.CO stat.ME

    Temperature Optimization for Bayesian Deep Learning

    Authors: Kenyon Ng, Chris van der Heide, Liam Hodgkinson, Susan Wei

    Abstract: The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where tempering the posterior to a cold temperature often improves the predictive performance of the posterior predictive distribution (PPD). Although the term `CPE' suggests colder temperatures are inherently better, the BDL community increasingly recognizes that this is not always the case. Despite this, there remai… ▽ More

    Submitted 11 June, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: 11 pages (+5 reference, +17 appendix). Accepted at UAI 2025

  7. arXiv:2403.04125  [pdf, other

    cs.CV

    ComFe: An Interpretable Head for Vision Transformers

    Authors: Evelyn J. Mannix, Liam Hodgkinson, Howard Bondell

    Abstract: Interpretable computer vision models explain their classifications through comparing the distances between the local embeddings of an image and a set of prototypes that represent the training data. However, these approaches introduce additional hyper-parameters that need to be tuned to apply to new datasets, scale poorly, and are more computationally intensive to train in comparison to black-box a… ▽ More

    Submitted 7 March, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

  8. arXiv:2311.07013  [pdf, ps, other

    stat.ML cs.LG

    A PAC-Bayesian Perspective on the Interpolating Information Criterion

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 9 pages

  9. arXiv:2307.07785  [pdf, other

    stat.ML cs.LG

    The Interpolating Information Criterion for Overparameterized Models

    Authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney

    Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized mod… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: 23 pages, 2 figures

  10. arXiv:2307.02501  [pdf, ps, other

    stat.ML cs.LG

    Generalization Guarantees via Algorithm-dependent Rademacher Complexity

    Authors: Sarah Sachs, Tim van Erven, Liam Hodgkinson, Rajiv Khanna, Umut Simsekli

    Abstract: Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  11. arXiv:2306.09262  [pdf, other

    stat.ML cs.LG cs.PL

    A Heavy-Tailed Algebra for Probabilistic Programming

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approac… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: 21 pages, 6 figures

  12. arXiv:2305.12313  [pdf, other

    stat.ML cs.LG

    When are ensembles really effective?

    Authors: Ryan Theisen, Hyunsuk Kim, Yaoqing Yang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Ensembling has a long history in statistical data analysis, with many impactful applications. However, in many modern machine learning settings, the benefits of ensembling are less ubiquitous and less obvious. We study, both theoretically and empirically, the fundamental question of when ensembling yields significant performance improvements in classification tasks. Theoretically, we prove new res… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  13. arXiv:2210.07612  [pdf, other

    stat.ML cs.LG

    Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: Despite their importance for assessing reliability of predictions, uncertainty quantification (UQ) measures for machine learning models have only recently begun to be rigorously characterized. One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input d… ▽ More

    Submitted 25 July, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: 33 pages, 21 figures

  14. arXiv:2205.07918  [pdf, other

    stat.ML cs.LG

    Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows

    Authors: Feynman Liang, Liam Hodgkinson, Michael W. Mahoney

    Abstract: While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately. We first improve previous theory on tails of Lipschitz flows by quantifying how the tails affect the rate of tail decay and by expanding the theory to non-Lipschitz polynomial… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  15. arXiv:2202.02842  [pdf, other

    cs.CL cs.LG

    Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

    Authors: Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

    Abstract: Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strong… ▽ More

    Submitted 4 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the 29th ACM SIGKDD international conference on knowledge discovery and data mining (2023)

  16. arXiv:2108.00781  [pdf, other

    stat.ML cs.LG

    Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers

    Authors: Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney

    Abstract: Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, this work mainly relied on continuous-time ap… ▽ More

    Submitted 11 July, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: 22 pages, 6 figures

  17. arXiv:2107.11228  [pdf, other

    cs.LG

    Taxonomizing local versus global structure in neural network loss landscapes

    Authors: Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney

    Abstract: Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization performance).… ▽ More

    Submitted 12 December, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Journal ref: Thirty-fifth Annual Conference on Neural Information Processing Systems, 2021

  18. arXiv:2106.10820  [pdf, other

    cs.LG stat.ML

    Stateful ODE-Nets using Basis Function Expansions

    Authors: Alejandro Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

    Abstract: The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view… ▽ More

    Submitted 6 November, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted at 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  19. arXiv:2102.04877  [pdf, other

    stat.ML cs.LG math.DS math.PR

    Noisy Recurrent Neural Networks

    Authors: Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney

    Abstract: We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regulari… ▽ More

    Submitted 1 December, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: 38 pages

    Journal ref: NeurIPS 2021 (https://proceedings.neurips.cc/paper/2021/hash/29301521774ff3cbd26652b2d5c95996-Abstract.html)

  20. arXiv:2006.12070  [pdf, other

    cs.LG math.DS stat.ML

    Lipschitz Recurrent Neural Networks

    Authors: N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

    Abstract: Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this… ▽ More

    Submitted 23 April, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2021

  21. arXiv:2006.06293  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Multiplicative noise and heavy tails in stochastic optimization

    Authors: Liam Hodgkinson, Michael W. Mahoney

    Abstract: Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: 30 pages, 7 figures

  22. arXiv:2002.09547  [pdf, other

    stat.ML cs.LG

    Stochastic Normalizing Flows

    Authors: Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney

    Abstract: We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equ… ▽ More

    Submitted 25 February, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: 17 pages, 4 figures

  23. arXiv:1907.08410  [pdf, other

    stat.ML cs.LG

    Geometric Rates of Convergence for Kernel-based Sampling Algorithms

    Authors: Rajiv Khanna, Liam Hodgkinson, Michael W. Mahoney

    Abstract: The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated. Under verifiable conditions on the chosen kernel and target measure, we establish a near-geometric rate of convergence for target measures that are nearly atomic. Furthermor… ▽ More

    Submitted 31 October, 2021; v1 submitted 19 July, 2019; originally announced July 2019.

    Comments: Accepted to UAI 2021 (Oral)

  24. arXiv:1903.12322  [pdf, other

    stat.ML cs.LG stat.CO

    Implicit Langevin Algorithms for Sampling From Log-concave Densities

    Authors: Liam Hodgkinson, Robert Salomone, Fred Roosta

    Abstract: For sampling from a log-concave density, we study implicit integrators resulting from $θ$-method discretization of the overdamped Langevin diffusion stochastic differential equation. Theoretical and algorithmic properties of the resulting sampling methods for $ θ\in [0,1] $ and a range of step sizes are established. Our results generalize and extend prior works in several directions. In particular… ▽ More

    Submitted 10 July, 2021; v1 submitted 28 March, 2019; originally announced March 2019.