Skip to main content

Showing 1–50 of 154 results for author: Wilson, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2506.08316  [pdf, ps, other

    cs.LG stat.ML

    Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion

    Authors: Alan N. Amin, Nate Gruver, Andrew Gordon Wilson

    Abstract: Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes with many conceptual benefits; for example, inductive biases can be incorporated into the noising Markov process, and access to improved sampling algorithms. In practice, however, the consistently best… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Code available at: https://github.com/AlanNawzadAmin/SCUD

  2. arXiv:2504.18452  [pdf, other

    stat.ME stat.CO

    Structured Bayesian Regression Tree Models for Estimating Distributed Lag Effects: The R Package dlmtree

    Authors: Seongwon Im, Ander Wilson, Daniel Mork

    Abstract: When examining the relationship between an exposure and an outcome, there is often a time lag between exposure and the observed effect on the outcome. A common statistical approach for estimating the relationship between the outcome and lagged measurements of exposure is a distributed lag model (DLM). Because repeated measurements are often autocorrelated, the lagged effects are typically constrai… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 22 pages, 11 figures, 2 tables

  3. arXiv:2504.09875  [pdf, other

    stat.CO

    Particle Hamiltonian Monte Carlo

    Authors: Alaa Amri, Víctor Elvira, Amy L. Wilson

    Abstract: In Bayesian inference, Hamiltonian Monte Carlo (HMC) is a popular Markov Chain Monte Carlo (MCMC) algorithm known for its efficiency in sampling from complex probability distributions. However, its application to models with latent variables, such as state-space models, poses significant challenges. These challenges arise from the need to compute gradients of the log-posterior of the latent variab… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  4. arXiv:2504.06363  [pdf, other

    stat.ME

    Distributed Lag Interaction Model with Index Modification

    Authors: Danielle Demateis, Sandra India Aldana, Robert O. Wright, Rosalind Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P. Keller

    Abstract: Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction m… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 32 pages, 3 figures, 2 tables

  5. arXiv:2504.04528  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    A Consequentialist Critique of Binary Classification Evaluation Practices

    Authors: Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. Wilson

    Abstract: ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-R… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  6. arXiv:2503.02113  [pdf, other

    cs.LG stat.ML

    Deep Learning is Not So Mysterious or Different

    Authors: Andrew Gordon Wilson

    Abstract: Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour c… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  7. arXiv:2502.17495  [pdf, other

    cs.LG physics.ao-ph stat.AP stat.ML

    Spatiotemporal Forecasting in Climate Data Using EOFs and Machine Learning Models: A Case Study in Chile

    Authors: Mauricio Herrera, Francisca Kleisinger, Andrés Wilsón

    Abstract: Effective resource management and environmental planning in regions with high climatic variability, such as Chile, demand advanced predictive tools. This study addresses this challenge by employing an innovative and computationally efficient hybrid methodology that integrates machine learning (ML) methods for time series forecasting with established statistical techniques. The spatiotemporal data… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 25 pages, 6 figures

  8. arXiv:2412.18701  [pdf, other

    stat.CO math.ST stat.ML

    High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

    Authors: Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

    Abstract: In this work, we propose a first-order sampling method called the Metropolis-adjusted Preconditioned Langevin Algorithm for approximate sampling from a target distribution whose support is a proper convex subset of $\mathbb{R}^{d}$. Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm with a… ▽ More

    Submitted 25 February, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: 55 pages, 5 figures, 2 tables. Shorter version without experiments accepted at ALT 2025. v3: fixes minor typographical errors

  9. arXiv:2412.07763  [pdf, other

    stat.ML cs.LG q-bio.BM

    Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

    Authors: Alan Nawzad Amin, Nate Gruver, Yilun Kuang, Lily Li, Hunter Elliott, Calvin McCarter, Aniruddh Raghu, Peyton Greenside, Andrew Gordon Wilson

    Abstract: To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We in… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Code available at https://github.com/AlanNawzadAmin/CloneBO

  10. arXiv:2410.02117  [pdf, other

    cs.LG stat.ML

    Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

    Authors: Andres Potapczynski, Shikai Qiu, Marc Finzi, Christopher Ferri, Zixi Chen, Micah Goldblum, Bayan Bruss, Christopher De Sa, Andrew Gordon Wilson

    Abstract: Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are op… ▽ More

    Submitted 4 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. Code available at https://github.com/AndPotap/einsum-search

  11. arXiv:2409.18005  [pdf, other

    stat.ME

    Collapsible Kernel Machine Regression for Exposomic Analyses

    Authors: Glen McGee, Brent A. Coull, Ander Wilson

    Abstract: An important goal of environmental epidemiology is to quantify the complex health risks posed by a wide array of environmental exposures. In analyses focusing on a smaller number of exposures within a mixture, flexible models like Bayesian kernel machine regression (BKMR) are appealing because they allow for non-linear and non-additive associations among mixture components. However, this flexibili… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  12. arXiv:2408.08450  [pdf, other

    stat.AP stat.ME

    Smooth and shape-constrained quantile distributed lag models

    Authors: Yisen Jin, Aaron J. Molstad, Ander Wilson, Joseph Antonelli

    Abstract: Exposure to environmental pollutants during the gestational period can significantly impact infant health outcomes, such as birth weight and neurological development. Identifying critical windows of susceptibility, which are specific periods during pregnancy when exposure has the most profound effects, is essential for developing targeted interventions. Distributed lag models (DLMs) are widely use… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  13. arXiv:2407.18158  [pdf, other

    stat.ML cs.LG

    Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

    Authors: Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson

    Abstract: Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  14. arXiv:2406.11463  [pdf, other

    cs.LG stat.ML

    Just How Flexible are Neural Networks in Practice?

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

    Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.09177  [pdf, other

    stat.ML cs.LG

    Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

    Authors: Alan Nawzad Amin, Andrew Gordon Wilson

    Abstract: To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML 2024; Code at https://github.com/AlanNawzadAmin/DAT-graph

  16. arXiv:2406.08391  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 5 December, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 Camera Ready

  17. arXiv:2404.02778  [pdf, other

    stat.AP

    Chain event graphs for assessing activity-level propositions in forensic science in relation to drug traces on banknotes

    Authors: Gail Robertson, Amy L Wilson, Jim Q Smith

    Abstract: Graphical models and likelihood ratios can be used by forensic scientists to compare support given by evidence to propositions put forward by competing parties during court proceedings. Such models can also be used to evaluate support for activity-level propositions, i.e. propositions that refer to the nature of activities associated with evidence and how this evidence came to be at a crime scene.… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  18. arXiv:2403.16628  [pdf, other

    stat.AP

    A comparison of graphical methods in the case of the murder of Meredith Kercher

    Authors: A. Philip Dawid, Francesco Dotto, Maxine Graves, Jay B. Kadane, Julia Mortera, Gail Robertson, Jim Q. Smith, Amy L. Wilson

    Abstract: We compare three graphical methods for displaying evidence in a legal case: Wigmore Charts, Bayesian Networks and Chain Event Graphs. We find that these methods are aimed at three distinct audiences, respectively lawyers, forensic scientists and the police. The methods are illustrated using part of the evidence in the case of the murder of Meredith Kercher. More specifically, we focus on represent… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  19. arXiv:2403.09869  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

    Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

    Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  20. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  21. Penalized Distributed Lag Interaction Model: Air Pollution, Birth Weight and Neighborhood Vulnerability

    Authors: Danielle Demateis, Kayleigh P. Keller, David Rojas-Rueda, Marianthi-Anna Kioumourtzoglou, Ander Wilson

    Abstract: Maternal exposure to air pollution during pregnancy has a substantial public health impact. Epidemiological evidence supports an association between maternal exposure to air pollution and low birth weight. A popular method to estimate this association while identifying windows of susceptibility is a distributed lag model (DLM), which regresses an outcome onto exposure history observed at multiple… ▽ More

    Submitted 21 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: 41 pages, 4 figures, 2 tables

  22. arXiv:2312.17173  [pdf, other

    stat.ML cs.LG

    Non-Vacuous Generalization Bounds for Large Language Models

    Authors: Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we d… ▽ More

    Submitted 17 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  23. arXiv:2312.17162  [pdf, other

    stat.ML cs.AI cs.LG

    Function-Space Regularization in Neural Networks: A Probabilistic Perspective

    Authors: Tim G. J. Rudner, Sanyam Kapoor, Shikai Qiu, Andrew Gordon Wilson

    Abstract: Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by vie… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Published in Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

  24. arXiv:2312.16360  [pdf, other

    stat.CO math.OC math.ST stat.ML

    Mean-field underdamped Langevin dynamics and its spacetime discretization

    Authors: Qiang Fu, Ashia Wilson

    Abstract: We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discr… ▽ More

    Submitted 6 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: 40 pages, 5 figures, 2 tables

  25. arXiv:2312.08823  [pdf, other

    stat.CO cs.DS cs.LG math.ST stat.ML

    Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm

    Authors: Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

    Abstract: We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of… ▽ More

    Submitted 21 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 49 pages, 6 figures, 2 tables. Shorter version without experiments accepted to COLT 2024

  26. arXiv:2311.15990  [pdf, other

    cs.LG stat.ML

    Should We Learn Most Likely Functions or Parameters?

    Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson

    Abstract: Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generall… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. Code available at https://github.com/activatedgeek/function-space-map

  27. arXiv:2309.06119  [pdf, other

    stat.AP

    Resource Adequacy and Capacity Procurement: Metrics and Decision Support Analysis

    Authors: Chris J. Dent, Nestor Sanchez, Aditi Shevni, Jim Q. Smith, Amy L. Wilson, Xuewen Yu

    Abstract: Resource adequacy studies typically use standard metrics such as Loss of Load Expectation and Expected Energy Unserved to quantify the risk of supply shortfalls. This paper critiques present approaches to adequacy assessment and capacity procurement in terms of their relevance to decision maker interests, before demonstrating alternatives including risk-averse metrics and visualisations of wider r… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 23 pages, 4 figures

  28. arXiv:2309.03060  [pdf, other

    cs.LG math.NA stat.ML

    CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

    Authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

    Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine le… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/wilson-labs/cola. NeurIPS 2023

  29. arXiv:2308.04012  [pdf, other

    stat.AP

    A hierarchical Bayesian model for estimating age-specific COVID-19 infection fatality rates in developing countries

    Authors: Sierra Pugh, Andrew T. Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B. Zwi, Anup Malani, Ander Wilson, Bailey K. Fosdick

    Abstract: The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scare resources. However, there lacks a coherent method to synthesize availa… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  30. arXiv:2306.11074  [pdf, other

    cs.LG stat.ML

    Simple and Fast Group Robustness by Automatic Feature Reweighting

    Authors: Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: ICML 23. Code available at https://github.com/AndPotap/afr

    Journal ref: 40th International Conference on Machine Learning 2023

  31. arXiv:2305.20028  [pdf, other

    cs.LG stat.ML

    A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

    Authors: Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

    Abstract: Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practic… ▽ More

    Submitted 8 May, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: ICLR 2024. Code available at https://github.com/yucenli/bnn-bo

  32. arXiv:2305.07564  [pdf

    stat.ME

    An Application of the Causal Roadmap in Two Safety Monitoring Case Studies: Covariate-Adjustment and Outcome Prediction using Electronic Health Record Data

    Authors: Brian D Williamson, Richard Wyss, Elizabeth A Stuart, Lauren E Dang, Andrew N Mertens, Andrew Wilson, Susan Gruber

    Abstract: Real-world data, such as administrative claims and electronic health records, are increasingly used for safety monitoring and to help guide regulatory decision-making. In these settings, it is important to document analytic decisions transparently and objectively to ensure that analyses meet their intended goals. The Causal Roadmap is an established framework that can guide and document analytic d… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 26 pages, 4 figures

  33. arXiv:2304.14994  [pdf, other

    cs.LG math.NA stat.ML

    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

    Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

    Abstract: Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code available at https://github.com/mfinzi/neural-ivp

  34. arXiv:2304.05366  [pdf, other

    cs.LG stat.ML

    The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

    Authors: Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

    Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets h… ▽ More

    Submitted 7 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Published at the International Conference on Machine Learning (ICML) 2024

  35. arXiv:2302.04019  [pdf, other

    cs.LG stat.ML

    Fortuna: A Library for Uncertainty Quantification in Deep Learning

    Authors: Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau

    Abstract: We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved unce… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  36. arXiv:2301.12937  [pdf, other

    stat.ME

    Incorporating prior information into distributed lag nonlinear models with zero-inflated monotone regression trees

    Authors: Daniel Mork, Ander Wilson

    Abstract: In environmental health research there is often interest in the effect of an exposure on a health outcome assessed on the same day and several subsequent days or lags. Distributed lag nonlinear models (DLNM) are a well-established statistical framework for estimating an exposure-lag-response function. We propose methods to allow for prior information to be incorporated into DLNMs. First, we impose… ▽ More

    Submitted 4 October, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    Comments: 29 pages, 5 figures, 2 tables

  37. arXiv:2211.13609  [pdf, other

    cs.LG stat.ML

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Authors: Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

    Abstract: While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tas… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-bayes

  38. arXiv:2210.12496  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization with Conformal Prediction Sets

    Authors: Samuel Stanton, Wesley Maddox, Andrew Gordon Wilson

    Abstract: Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncerta… ▽ More

    Submitted 12 December, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: For code, see https://www.github.com/samuelstanton/conformal-bayesopt.git

    Journal ref: Proceedings of Machine Learning Research, Volume 206, 959-986, PMLR, 2023

  39. arXiv:2210.11369  [pdf, other

    cs.LG cs.CV stat.ML

    On Feature Learning in the Presence of Spurious Correlations

    Authors: Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson

    Abstract: Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds. In this paper we evaluate the amount of information about the core (non-spurious) features that can be decoded from the representations learne… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022. Code available at https://github.com/izmailovpavel/spurious_feature_learning

  40. arXiv:2210.02984  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Lie Derivative for Measuring Learned Equivariance

    Authors: Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, whic… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICLR 2023. Code available at: https://github.com/ngruver/lie-deriv

  41. arXiv:2209.12269  [pdf, other

    stat.ML cs.LG

    Algorithms that Approximate Data Removal: New Results and Limitations

    Authors: Vinith M. Suriyakumar, Ashia C. Wilson

    Abstract: We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computat… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022

  42. arXiv:2207.08200  [pdf, other

    stat.ML cs.AI cs.LG

    Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

    Authors: Gianluca Detommaso, Alberto Gasparin, Andrew Wilson, Cedric Archambeau

    Abstract: As we move away from the data, the predictive uncertainty should increase, since a great variety of explanations are consistent with the little available information. We introduce Distance-Aware Prior (DAP) calibration, a method to correct overconfidence of Bayesian deep learning models outside of the training domain. We define DAPs as prior distributions over the model parameters that depend on t… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  43. arXiv:2207.06544  [pdf, other

    cs.LG q-fin.ST stat.ML

    Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes

    Authors: Gregory Benton, Wesley J. Maddox, Andrew Gordon Wilson

    Abstract: A broad class of stochastic volatility models are defined by systems of stochastic differential equations. While these models have seen widespread success in domains such as finance and statistical climatology, they typically lack an ability to condition on historical data to produce a true posterior distribution. To address this fundamental limitation, we show how to re-cast a class of stochastic… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: ICML 2022. Code available at https://github.com/g-benton/Volt

  44. arXiv:2206.15306  [pdf, other

    cs.LG stat.ML

    Transfer Learning with Deep Tabular Models

    Authors: Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

    Abstract: Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applica… ▽ More

    Submitted 7 August, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  45. arXiv:2206.09909  [pdf, other

    cs.LG stat.ML

    Low-Precision Stochastic Gradient Langevin Dynamics

    Authors: Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

    Abstract: While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Lang… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Published at ICML 2022

  46. arXiv:2204.06610  [pdf, other

    stat.ME stat.AP

    Infinite Hidden Markov Models for Multiple Multivariate Time Series with Missing Data

    Authors: Lauren Hoskovec, Matthew D. Koslovsky, Kirsten Koehler, Nicholas Good, Jennifer L. Peel, John Volckens, Ander Wilson

    Abstract: Exposure to air pollution is associated with increased morbidity and mortality. Recent technological advancements permit the collection of time-resolved personal exposure data. Such data are often incomplete with missing observations and exposures below the limit of detection, which limit their use in health effects studies. In this paper we develop an infinite hidden Markov model for multiple asy… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  47. arXiv:2204.02937  [pdf, other

    cs.LG cs.CV stat.ML

    Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

    Authors: Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions. However, even in these cases, we show that they still often learn core features associated with the desired attributes of the data, contrary to recent findings. Inspired by this insight, we demonstrate that simple last layer retraining can match or outperform state-of-the-art approach… ▽ More

    Submitted 30 June, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: ICLR 2023. Code is available at https://github.com/PolinaKirichenko/deep_feature_reweighting

  48. arXiv:2204.00040  [pdf, other

    stat.ME stat.AP

    Integrating Biological Knowledge in Kernel-Based Analyses of Environmental Mixtures and Health

    Authors: Glen McGee, Ander Wilson, Brent A Coull, Thomas F Webster

    Abstract: A key goal of environmental health research is to assess the risk posed by mixtures of pollutants. As epidemiologic studies of mixtures can be expensive to conduct, it behooves researchers to incorporate prior knowledge about mixtures into their analyses. This work extends the Bayesian multiple index model (BMIM), which assumes the exposure-response function is a non-parametric function of a set o… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

  49. arXiv:2203.16481  [pdf, other

    cs.LG stat.ML

    On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

    Authors: Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Aleatoric uncertainty captures the inherent randomness of the data, such as measurement noise. In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter. By contrast, for Bayesian classification we use a categorical distribution with no mechanism to represent our beliefs about aleatoric uncertainty. Our wo… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  50. arXiv:2203.12742  [pdf, other

    cs.LG cs.NE q-bio.QM stat.ML

    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

    Authors: Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

    Abstract: Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of mult… ▽ More

    Submitted 12 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: ICML 2022. Code available at https://github.com/samuelstanton/lambo