Skip to main content

Showing 1–17 of 17 results for author: Finzi, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2410.02117  [pdf, other

    cs.LG stat.ML

    Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

    Authors: Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson

    Abstract: Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are op… ▽ More

    Submitted 4 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. Code available at https://github.com/AndPotap/einsum-search

  2. arXiv:2407.18158  [pdf, other

    stat.ML cs.LG

    Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

    Authors: Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson

    Abstract: Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  3. arXiv:2312.17173  [pdf, other

    stat.ML cs.LG

    Non-Vacuous Generalization Bounds for Large Language Models

    Authors: Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we d… ▽ More

    Submitted 17 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  4. arXiv:2309.03060  [pdf, other

    cs.LG math.NA stat.ML

    CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

    Authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

    Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine le… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/wilson-labs/cola. NeurIPS 2023

  5. arXiv:2304.14994  [pdf, other

    cs.LG math.NA stat.ML

    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

    Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

    Abstract: Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code available at https://github.com/mfinzi/neural-ivp

  6. arXiv:2304.05366  [pdf, other

    cs.LG stat.ML

    The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

    Authors: Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

    Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets h… ▽ More

    Submitted 7 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Published at the International Conference on Machine Learning (ICML) 2024

  7. arXiv:2211.13609  [pdf, other

    cs.LG stat.ML

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Authors: Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

    Abstract: While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tas… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-bayes

  8. arXiv:2210.02984  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Lie Derivative for Measuring Learned Equivariance

    Authors: Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, whic… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICLR 2023. Code available at: https://github.com/ngruver/lie-deriv

  9. arXiv:2202.04836  [pdf, other

    cs.LG math.DS physics.data-an stat.ML

    Deconstructing the Inductive Biases of Hamiltonian Neural Networks

    Authors: Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson

    Abstract: Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the in… ▽ More

    Submitted 11 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICLR 2022. Code available at https://github.com/ngruver/decon-hnn

  10. arXiv:2112.01388  [pdf, other

    cs.LG stat.ML

    Residual Pathway Priors for Soft Equivariance Constraints

    Authors: Marc Finzi, Gregory Benton, Andrew Gordon Wilson

    Abstract: There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture ad… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021. Code available at https://github.com/mfinzi/residual-pathway-priors

  11. arXiv:2106.06695  [pdf, other

    cs.LG stat.ML

    SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

    Authors: Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice us… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  12. arXiv:2104.09459  [pdf, other

    cs.LG math.DS stat.ML

    A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups

    Authors: Marc Finzi, Max Welling, Andrew Gordon Wilson

    Abstract: Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: Library: https://github.com/mfinzi/equivariant-MLP, Documentation: https://emlp.readthedocs.io/en/latest/, Examples: https://colab.research.google.com/github/mfinzi/equivariant-MLP/blob/master/docs/notebooks/colabs/all.ipynb

  13. arXiv:2010.13581  [pdf, other

    cs.LG math.DS physics.comp-ph physics.data-an stat.ML

    Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

    Authors: Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show th… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/mfinzi/constrained-hamiltonian-neural-networks

  14. arXiv:2010.11882  [pdf, other

    cs.LG stat.ML

    Learning Invariances in Neural Networks

    Authors: Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Invariances to translations have imbued convolutional neural networks with powerful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to \emph{learn} invariances and equivariances by parameterizing a distribution over augmentations and optimizing the traini… ▽ More

    Submitted 1 December, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/g-benton/learning-invariances

  15. arXiv:2002.12880  [pdf, other

    stat.ML cs.LG

    Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

    Authors: Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equi… ▽ More

    Submitted 24 September, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: ICML 2020. Code available at https://github.com/mfinzi/LieConv

  16. arXiv:1912.13025  [pdf, other

    cs.LG stat.ML

    Semi-Supervised Learning with Normalizing Flows

    Authors: Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson

    Abstract: Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood. We propose FlowGMM, an end-to-end approach to generative semi supervised learning with normalizing flows, using a latent Gaussian mixture model. FlowGMM is distinct in its simplicity, unified treatment of… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  17. arXiv:1806.05594  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

    Authors: Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency regularization, we conceptually explore how loss geometry interacts with training procedures. The consistency loss dramatically improves generalization performance over su… ▽ More

    Submitted 21 February, 2019; v1 submitted 14 June, 2018; originally announced June 2018.

    Comments: Appears at ICLR 2019