Search | arXiv e-print repository

Estimating the Local Learning Coefficient at Scale

Abstract: The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a… ▽ More The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {\tt arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity. △ Less

Submitted 30 September, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: This paper has been expanded and merged with arXiv:2308.12108 to form a more comprehensive study. Please refer to the latest version of that preprint for the most up-to-date manuscript

MSC Class: 68T07; 14B05; 62F15

arXiv:2308.12108 [pdf, other]

The Local Learning Coefficient: A Singularity-Aware Complexity Measure

Authors: Edmund Lau, Zach Furman, George Wang, Daniel Murfet, Susan Wei

Abstract: The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs). Recognizing the limitations of traditional complexity measures, the LLC leverages Singular Learning Theory (SLT), which has long recognized the significance of singularities in the loss landscape geometry. This paper provides an extensive exploration of the LLC's theoretical underpinni… ▽ More The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs). Recognizing the limitations of traditional complexity measures, the LLC leverages Singular Learning Theory (SLT), which has long recognized the significance of singularities in the loss landscape geometry. This paper provides an extensive exploration of the LLC's theoretical underpinnings, offering both a clear definition and intuitive insights into its application. Moreover, we propose a new scalable estimator for the LLC, which is then effectively applied across diverse architectures including deep linear networks up to 100M parameters, ResNet image models, and transformer language models. Empirical evidence suggests that the LLC provides valuable insights into how training heuristics might influence the effective complexity of DNNs. Ultimately, the LLC emerges as a crucial tool for reconciling the apparent contradiction between deep learning's complexity and the principle of parsimony. △ Less

Submitted 30 September, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: This version contains new empirical results and merged content from a related paper (arXiv:2402.03698) to provide a more comprehensive study

MSC Class: 62F15; 68T07; 14B05

arXiv:2302.06035 [pdf, other]

Variational Bayesian Neural Networks via Resolution of Singularities

Authors: Susan Wei, Edmund Lau

Abstract: In this work, we advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs). To begin, using SLT, we lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective. Next, we… ▽ More In this work, we advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs). To begin, using SLT, we lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective. Next, we use the SLT-corrected asymptotic form for singular posterior distributions to inform the design of the variational family itself. Specifically, we build upon the idealized variational family introduced in \citet{bhattacharya_evidence_2020} which is theoretically appealing but practically intractable. Our proposal takes shape as a normalizing flow where the base distribution is a carefully-initialized generalized gamma. We conduct experiments comparing this to the canonical Gaussian base distribution and show improvements in terms of variational free energy and variational generalization error. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 32 pages, 13 figures

MSC Class: 62F15 (Primary); 68T07 (Secondary); 68T05

Showing 1–3 of 3 results for author: Lau, E