-
AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent
Authors:
Nikola Surjanovic,
Alexandre Bouchard-Côté,
Trevor Campbell
Abstract:
The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learn…
▽ More
The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration and then takes appropriate action. We introduce theory supporting the convergence of AutoSGD, along with its deterministic counterpart for standard gradient descent. Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Variational phylogenetic inference with products over bipartitions
Authors:
Evan Sidrow,
Alexandre Bouchard-Côté,
Lloyd T. Elliott
Abstract:
Bayesian phylogenetics requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density of the resulting distribution over trees. Unlike existing methods for u…
▽ More
Bayesian phylogenetics requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density of the resulting distribution over trees. Unlike existing methods for ultrametric trees, our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable. Through experiments on benchmark genomic datasets and an application to SARS-CoV-2, we demonstrate that our method achieves competitive accuracy while requiring significantly fewer gradient evaluations than existing state-of-the-art techniques.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
AutoStep: Locally adaptive involutive MCMC
Authors:
Tiange Liu,
Nikola Surjanovic,
Miguel Biron-Lattes,
Alexandre Bouchard-Côté,
Trevor Campbell
Abstract:
Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC metho…
▽ More
Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC methods -- AutoStep MCMC -- that selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We prove that under mild conditions AutoStep MCMC is $π$-invariant, irreducible, and aperiodic, and obtain bounds on expected energy jump distance and cost per iteration. Empirical results examine the robustness and efficacy of our proposed step size selection procedure, and show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost on a range of challenging target distributions.
△ Less
Submitted 20 May, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
MCMC-driven learning
Authors:
Alexandre Bouchard-Côté,
Trevor Campbell,
Geoff Pleiss,
Nikola Surjanovic
Abstract:
This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning$\unicode{x2014}$which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset const…
▽ More
This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning$\unicode{x2014}$which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset construction for MCMC with big data, Markov chain gradient descent, Markovian score climbing, and more$\unicode{x2014}$within one common framework. By doing so, the theory and methods developed for each may be translated and generalized.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Pigeons.jl: Distributed Sampling From Intractable Distributions
Authors:
Nikola Surjanovic,
Miguel Biron-Lattes,
Paul Tiede,
Saifuddin Syed,
Trevor Campbell,
Alexandre Bouchard-Côté
Abstract:
We introduce a software package, Pigeons.jl, that provides a way to leverage distributed computation to obtain samples from complicated probability distributions, such as multimodal posteriors arising in Bayesian inference and high-dimensional distributions in statistical mechanics. Pigeons.jl provides simple APIs to perform such computations single-threaded, multi-threaded, and/or distributed ove…
▽ More
We introduce a software package, Pigeons.jl, that provides a way to leverage distributed computation to obtain samples from complicated probability distributions, such as multimodal posteriors arising in Bayesian inference and high-dimensional distributions in statistical mechanics. Pigeons.jl provides simple APIs to perform such computations single-threaded, multi-threaded, and/or distributed over thousands of MPI-communicating machines. In addition, Pigeons.jl guarantees a property that we call strong parallelism invariance: the output for a given seed is identical irrespective of the number of threads and processes, which is crucial for scientific reproducibility and software validation. We describe the key features of Pigeons.jl and the approach taken to implement a distributed and randomized algorithm that satisfies strong parallelism invariance.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Slice Sampling for General Completely Random Measures
Authors:
Peiyuan Zhu,
Alexandre Bouchard-Côté,
Trevor Campbell
Abstract:
Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inference requires either marginalization---resulting in dependence structures that prevent efficient computation…
▽ More
Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inference requires either marginalization---resulting in dependence structures that prevent efficient computation via parallelization and conjugacy---or finite truncation, which arbitrarily limits the flexibility of the model. In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. In contrast to past work that achieved this on a model-by-model basis, we provide a general recipe that is applicable to the broad class of completely random measure-based priors. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models, demonstrating a higher effective sample size per second compared to algorithms using marginalization as well as a higher predictive performance compared to models employing fixed truncations.
△ Less
Submitted 25 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Particle-Gibbs Sampling For Bayesian Feature Allocation Models
Authors:
Alexandre Bouchard-Côté,
Andrew Roth
Abstract:
Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise up…
▽ More
Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. However, this sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models.
△ Less
Submitted 25 January, 2020;
originally announced January 2020.
-
Analysis of high-dimensional Continuous Time Markov Chains using the Local Bouncy Particle Sampler
Authors:
Tingting Zhao,
Alexandre Bouchard-Côté
Abstract:
Sampling the parameters of high-dimensional Continuous Time Markov Chains (CTMC) is a challenging problem with important applications in many fields of applied statistics. In this work a recently proposed type of non-reversible rejection-free Markov Chain Monte Carlo (MCMC) sampler, the Bouncy Particle Sampler (BPS), is brought to bear to this problem. BPS has demonstrated its favorable computatio…
▽ More
Sampling the parameters of high-dimensional Continuous Time Markov Chains (CTMC) is a challenging problem with important applications in many fields of applied statistics. In this work a recently proposed type of non-reversible rejection-free Markov Chain Monte Carlo (MCMC) sampler, the Bouncy Particle Sampler (BPS), is brought to bear to this problem. BPS has demonstrated its favorable computational efficiency compared with state-of-the-art MCMC algorithms, however to date applications to real-data scenario were scarce. An important aspect of the practical implementation of BPS is the simulation of event times. Default implementations use conservative thinning bounds. Such bounds can slow down the algorithm and limit the computational performance. Our paper develops an algorithm with an exact analytical solution to the random event times in the context of CTMCs. Our local version of BPS algorithm takes advantage of the sparse structure in the target factor graph and we also provide a framework for assessing the computational complexity of local BPS algorithms.
△ Less
Submitted 29 May, 2021; v1 submitted 30 May, 2019;
originally announced May 2019.
-
Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets
Authors:
Robert Cornish,
Paul Vanetti,
Alexandre Bouchard-Côté,
George Deligiannidis,
Arnaud Doucet
Abstract:
Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods is too computationally intensive to handle large datasets, since the cost per step usually scales like $Θ(n)$ in the number of data points $n$. We propose the Scalable Metropolis-Hastings (SMH) kernel that exploits Gaussian concentration of the posterior to require processing on average only $O(1)$ or even $O(1/\sqrt{n})$ data…
▽ More
Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods is too computationally intensive to handle large datasets, since the cost per step usually scales like $Θ(n)$ in the number of data points $n$. We propose the Scalable Metropolis-Hastings (SMH) kernel that exploits Gaussian concentration of the posterior to require processing on average only $O(1)$ or even $O(1/\sqrt{n})$ data points per step. This scheme is based on a combination of factorized acceptance probabilities, procedures for fast simulation of Bernoulli processes, and control variate ideas. Contrary to many MCMC subsampling schemes such as fixed step-size Stochastic Gradient Langevin Dynamics, our approach is exact insofar as the invariant distribution is the true posterior and not an approximation to it. We characterise the performance of our algorithm theoretically, and give realistic and verifiable conditions under which it is geometrically ergodic. This theory is borne out by empirical results that demonstrate overall performance benefits over standard Metropolis-Hastings and various subsampling algorithms.
△ Less
Submitted 10 June, 2019; v1 submitted 28 January, 2019;
originally announced January 2019.
-
An Entropy Search Portfolio for Bayesian Optimization
Authors:
Bobak Shahriari,
Ziyu Wang,
Matthew W. Hoffman,
Alexandre Bouchard-Côté,
Nando de Freitas
Abstract:
Bayesian optimization is a sample-efficient method for black-box global optimization. How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection…
▽ More
Bayesian optimization is a sample-efficient method for black-box global optimization. How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i.e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance. While portfolio methods provide an effective, principled way of combining a collection of acquisition functions, they are often based on measures of past performance which can be misleading. To address this issue, we introduce the Entropy Search Portfolio (ESP): a novel approach to portfolio construction which is motivated by information theoretic considerations. We show that ESP outperforms existing portfolio methods on several real and synthetic problems, including geostatistical datasets and simulated control tasks. We not only show that ESP is able to offer performance as good as the best, but unknown, acquisition function, but surprisingly it often gives better performance. Finally, over a wide range of conditions we find that ESP is robust to the inclusion of poor acquisition functions.
△ Less
Submitted 4 March, 2015; v1 submitted 18 June, 2014;
originally announced June 2014.
-
A Note on Probabilistic Models over Strings: the Linear Algebra Approach
Authors:
Alexandre Bouchard-Côté
Abstract:
Probabilistic models over strings have played a key role in developing methods allowing indels to be treated as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question in the field is the complexity of computing the normalization of a class of str…
▽ More
Probabilistic models over strings have played a key role in developing methods allowing indels to be treated as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question in the field is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a new proof of a known result on the complexity of inference on TKF91, a well-known probabilistic model over strings. Our proof uses a different approach based on classical linear algebra results, and is in some cases easier to extend to other models. The proving method also has consequences on the implementation and complexity of inference algorithms.
△ Less
Submitted 11 July, 2013; v1 submitted 21 January, 2013;
originally announced January 2013.
-
Optimization of Structured Mean Field Objectives
Authors:
Alexandre Bouchard-Cote,
Michael I. Jordan
Abstract:
In intractable, undirected graphical models, an intuitive way of creating structured mean field approximations is to select an acyclic tractable subgraph. We show that the hardness of computing the objective function and gradient of the mean field objective qualitatively depends on a simple graph property. If the tractable subgraph has this property- we call such subgraphs v-acyclic-a very fast bl…
▽ More
In intractable, undirected graphical models, an intuitive way of creating structured mean field approximations is to select an acyclic tractable subgraph. We show that the hardness of computing the objective function and gradient of the mean field objective qualitatively depends on a simple graph property. If the tractable subgraph has this property- we call such subgraphs v-acyclic-a very fast block coordinate ascent algorithm is possible. If not, optimization is harder, but we show a new algorithm based on the construction of an auxiliary exponential family that can be used to make inference possible in this case as well. We discuss the advantages and disadvantages of each regime and compare the algorithms empirically.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.