-
Harnessing the Power of Reinforcement Learning for Adaptive MCMC
Authors:
Congye Wang,
Matthew A. Fisher,
Heishiro Kanagawa,
Wilson Chen,
Chris. J. Oates
Abstract:
Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et a…
▽ More
Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide insufficient signal for training RLMH. Instead, we propose a novel reward based on the contrastive divergence, whose superior performance in the context of RLMH is demonstrated. Second, we explore the potential of RLMH and present adaptive gradient-based samplers that balance flexibility of the Markov transition kernel with learnability of the associated RL task. A comprehensive simulation study using the posteriordb benchmark supports the practical effectiveness of RLMH.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Stationary MMD Points for Cubature
Authors:
Zonghao Chen,
Toni Karvonen,
Heishiro Kanagawa,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance, arising in cubature, data compression, and optimisation. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective precludes global minimisation in general. Instead, we consider \emph{stationary} points…
▽ More
Approximation of a target probability distribution using a finite set of points is a problem of fundamental importance, arising in cubature, data compression, and optimisation. Several authors have proposed to select points by minimising a maximum mean discrepancy (MMD), but the non-convexity of this objective precludes global minimisation in general. Instead, we consider \emph{stationary} points of the MMD which, in contrast to points globally minimising the MMD, can be accurately computed. Our main theoretical contribution is the (perhaps surprising) result that, for integrands in the associated reproducing kernel Hilbert space, the cubature error of stationary MMD points vanishes \emph{faster} than the MMD. Motivated by this \emph{super-convergence} property, we consider discretised gradient flows as a practical strategy for computing stationary points of the MMD, presenting a refined convergence analysis that establishes a novel non-asymptotic finite-particle error bound, which may be of independent interest.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Fast Approximate Solution of Stein Equations for Post-Processing of MCMC
Authors:
Qingyang Liu,
Heishiro Kanagawa,
Matthew A. Fisher,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Bayesian inference is conceptually elegant, but calculating posterior expectations can entail a heavy computational cost. Monte Carlo methods are reliable and supported by strong asymptotic guarantees, but do not leverage smoothness of the integrand. Solving Stein equations has emerged as a possible alternative, providing a framework for numerical approximation of posterior expectations in which s…
▽ More
Bayesian inference is conceptually elegant, but calculating posterior expectations can entail a heavy computational cost. Monte Carlo methods are reliable and supported by strong asymptotic guarantees, but do not leverage smoothness of the integrand. Solving Stein equations has emerged as a possible alternative, providing a framework for numerical approximation of posterior expectations in which smoothness can be exploited. However, existing numerical methods for Stein equations are associated with high computational cost due to the need to solve large linear systems. This paper considers the combination of iterative linear solvers and preconditioning strategies to obtain fast approximate solutions of Stein equations.
△ Less
Submitted 13 June, 2025; v1 submitted 11 January, 2025;
originally announced January 2025.
-
Prediction-Centric Uncertainty Quantification via MMD
Authors:
Zheyang Shen,
Jeremias Knoblauch,
Sam Power,
Chris. J. Oates
Abstract:
Deterministic mathematical models, such as those specified via differential equations, are a powerful tool to communicate scientific insight. However, such models are necessarily simplified descriptions of the real world. Generalised Bayesian methodologies have been proposed for inference with misspecified models, but these are typically associated with vanishing parameter uncertainty as more data…
▽ More
Deterministic mathematical models, such as those specified via differential equations, are a powerful tool to communicate scientific insight. However, such models are necessarily simplified descriptions of the real world. Generalised Bayesian methodologies have been proposed for inference with misspecified models, but these are typically associated with vanishing parameter uncertainty as more data are observed. In the context of a misspecified deterministic mathematical model, this has the undesirable consequence that posterior predictions become deterministic and certain, while being incorrect. Taking this observation as a starting point, we propose Prediction-Centric Uncertainty Quantification, where a mixture distribution based on the deterministic model confers improved uncertainty quantification in the predictive context. Computation of the mixing distribution is cast as a (regularised) gradient flow of the maximum mean discrepancy (MMD), enabling consistent numerical approximations to be obtained. Results are reported on both a toy model from population ecology and a real model of protein signalling in cell biology.
△ Less
Submitted 21 March, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
Grand Challenges in Bayesian Computation
Authors:
Anirban Bhattacharya,
Antonio Linero,
Chris. J. Oates
Abstract:
This article appeared in the September 2024 issue (Vol. 31, No. 3) of the Bulletin of the International Society for Bayesian Analysis (ISBA).
This article appeared in the September 2024 issue (Vol. 31, No. 3) of the Bulletin of the International Society for Bayesian Analysis (ISBA).
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Scalable Monte Carlo for Bayesian Learning
Authors:
Paul Fearnhead,
Christopher Nemeth,
Chris J. Oates,
Chris Sherlock
Abstract:
This book aims to provide a graduate-level introduction to advanced topics in Markov chain Monte Carlo (MCMC) algorithms, as applied broadly in the Bayesian computational context. Most, if not all of these topics (stochastic gradient MCMC, non-reversible MCMC, continuous time MCMC, and new techniques for convergence assessment) have emerged as recently as the last decade, and have driven substanti…
▽ More
This book aims to provide a graduate-level introduction to advanced topics in Markov chain Monte Carlo (MCMC) algorithms, as applied broadly in the Bayesian computational context. Most, if not all of these topics (stochastic gradient MCMC, non-reversible MCMC, continuous time MCMC, and new techniques for convergence assessment) have emerged as recently as the last decade, and have driven substantial recent practical and theoretical advances in the field. A particular focus is on methods that are scalable with respect to either the amount of data, or the data dimension, motivated by the emerging high-priority application areas in machine learning and AI.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Operator-Informed Score Matching for Markov Diffusion Models
Authors:
Zheyang Shen,
Huihui Wang,
Marina Riabiz,
Chris J. Oates
Abstract:
Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the Markov operators that govern the noising process are well-understood. Specifically, by leveraging the spectral decomposition of the infinitesimal generator of…
▽ More
Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the Markov operators that govern the noising process are well-understood. Specifically, by leveraging the spectral decomposition of the infinitesimal generator of the Markov noising process, we obtain parametric estimates of the score functions simultaneously for all marginal distributions, using only sample averages with respect to the data distribution. The resulting operator-informed score matching provides both a standalone approach to sample generation for low-dimensional distributions, as well as a recipe for better informed neural score estimators in high-dimensional settings.
△ Less
Submitted 24 May, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Reinforcement Learning for Adaptive MCMC
Authors:
Congye Wang,
Wilson Chen,
Heishiro Kanagawa,
Chris. J. Oates
Abstract:
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is th…
▽ More
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Probabilistic Richardson Extrapolation
Authors:
Chris. J. Oates,
Toni Karvonen,
Aretha L. Teckentrup,
Marina Strocchi,
Steven A. Niederer
Abstract:
For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that un…
▽ More
For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that unifies classical extrapolation methods with modern multi-fidelity modelling, and handles uncertain convergence orders by allowing these to be statistically estimated. The approach is developed using Gaussian processes, leading to Gauss-Richardson Extrapolation (GRE). Conditions are established under which extrapolation using the conditional mean achieves a polynomial (or even an exponential) speed-up compared to the original numerical method. Further, the probabilistic formulation unlocks the possibility of experimental design, casting the selection of fidelities as a continuous optimisation problem which can then be (approximately) solved. A case-study involving a computational cardiac model demonstrates that practical gains in accuracy can be achieved using the GRE method.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Online Semiparametric Regression via Sequential Monte Carlo
Authors:
Marianne Menictas,
Chris J. Oates,
Matt P. Wand
Abstract:
We develop and describe online algorithms for performing online semiparametric regression analyses. Earlier work on this topic is in Luts, Broderick & Wand (J. Comput. Graph. Statist., 2014) where online mean field variational Bayes was employed. In this article we instead develop sequential Monte Carlo approaches to circumvent well-known inaccuracies inherent in variational approaches. Even thoug…
▽ More
We develop and describe online algorithms for performing online semiparametric regression analyses. Earlier work on this topic is in Luts, Broderick & Wand (J. Comput. Graph. Statist., 2014) where online mean field variational Bayes was employed. In this article we instead develop sequential Monte Carlo approaches to circumvent well-known inaccuracies inherent in variational approaches. Even though sequential Monte Carlo is not as fast as online mean field variational Bayes, it can be a viable alternative for applications where the data rate is not overly high. For Gaussian response semiparametric regression models our new algorithms share the online mean field variational Bayes property of only requiring updating and storage of sufficient statistics quantities of streaming data. In the non-Gaussian case accurate real-time semiparametric regression requires the full data to be kept in storage. The new algorithms allow for new options concerning accuracy/speed trade-offs for online semiparametric regression.
△ Less
Submitted 27 August, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Stein $Π$-Importance Sampling
Authors:
Congye Wang,
Wilson Chen,
Heishiro Kanagawa,
Chris. J. Oates
Abstract:
Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $Π$-invariant Markov chain to obtain a consistent approx…
▽ More
Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $Π$-invariant Markov chain to obtain a consistent approximation of $P$, the intended target. Surprisingly, the optimal choice of $Π$ is not identical to the target $P$; we therefore propose an explicit construction for $Π$ based on a novel variational argument. Explicit conditions for convergence of Stein $Π$-Importance Sampling are established. For $\approx 70\%$ of tasks in the PosteriorDB benchmark, a significant improvement over the analogous post-processing of $P$-invariant Markov chains is reported.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Meta-learning Control Variates: Variance Reduction with Limited Data
Authors:
Zhuo Sun,
Chris J. Oates,
François-Xavier Briol
Abstract:
Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of…
▽ More
Control variates can be a powerful tool to reduce the variance of Monte Carlo estimators, but constructing effective control variates can be challenging when the number of samples is small. In this paper, we show that when a large number of related integrals need to be computed, it is possible to leverage the similarity between these integration tasks to improve performance even when the number of samples per task is very small. Our approach, called meta learning CVs (Meta-CVs), can be used for up to hundreds or thousands of tasks. Our empirical assessment indicates that Meta-CVs can lead to significant variance reduction in such settings, and our theoretical analysis establishes general conditions under which Meta-CVs can be successfully trained.
△ Less
Submitted 7 June, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Sobolev Spaces, Kernels and Discrepancies over Hyperspheres
Authors:
Simon Hubbert,
Emilio Porcu,
Chris. J. Oates,
Mark Girolami
Abstract:
This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicabilit…
▽ More
This work provides theoretical foundations for kernel methods in the hyperspherical context. Specifically, we characterise the native spaces (reproducing kernel Hilbert spaces) and the Sobolev spaces associated with kernels defined over hyperspheres. Our results have direct consequences for kernel cubature, determining the rate of convergence of the worst case error, and expanding the applicability of cubature algorithms based on Stein's method. We first introduce a suitable characterisation on Sobolev spaces on the $d$-dimensional hypersphere embedded in $(d+1)$-dimensional Euclidean spaces. Our characterisation is based on the Fourier--Schoenberg sequences associated with a given kernel. Such sequences are hard (if not impossible) to compute analytically on $d$-dimensional spheres, but often feasible over Hilbert spheres. We circumvent this problem by finding a projection operator that allows to Fourier mapping from Hilbert into finite dimensional hyperspheres. We illustrate our findings through some parametric families of kernels.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Minimum Kernel Discrepancy Estimators
Authors:
Chris. J. Oates
Abstract:
For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, who…
▽ More
For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, whose use in statistical applications is reviewed, and a general theoretical framework for establishing their asymptotic properties is presented.
△ Less
Submitted 23 August, 2023; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Gradient-Free Kernel Stein Discrepancy
Authors:
Matthew A Fisher,
Chris. J Oates
Abstract:
Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical mo…
▽ More
Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical models, the stable numerical computation of derivatives can require bespoke algorithmic development and render Stein discrepancies impractical. This paper focuses on posterior approximation using Stein discrepancies, and introduces a collection of non-canonical Stein discrepancies that are gradient free, meaning that derivatives of the statistical model are not required. Sufficient conditions for convergence detection and control are established, and applications to sampling and variational inference are presented.
△ Less
Submitted 18 July, 2022; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Generalised Bayesian Inference for Discrete Intractable Likelihood
Authors:
Takuo Matsubara,
Jeremias Knoblauch,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspir…
▽ More
Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence, in lieu of the problematic intractable likelihood. The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov chain Monte Carlo, circumventing the intractable normalising constant. The statistical properties of the generalised posterior are analysed, with sufficient conditions for posterior consistency and asymptotic normality established. In addition, a novel and general approach to calibration of generalised posteriors is proposed. Applications are presented on lattice models for discrete spatial data and on multivariate models for count data, where in each case the methodology facilitates generalised Bayesian inference at low computational cost.
△ Less
Submitted 1 September, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed
Authors:
Toni Karvonen,
Chris J. Oates
Abstract:
Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression…
▽ More
Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.
△ Less
Submitted 25 April, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
A Statistical Approach to Surface Metrology for 3D-Printed Stainless Steel
Authors:
Chris. J. Oates,
Wilfrid S. Kendall,
Liam Fleming
Abstract:
Surface metrology is the area of engineering concerned with the study of geometric variation in surfaces. This paper explores the potential for modern techniques from spatial statistics to act as generative models for geometric variation in 3D-printed stainless steel. The complex macro-scale geometries of 3D-printed components pose a challenge that is not present in traditional surface metrology,…
▽ More
Surface metrology is the area of engineering concerned with the study of geometric variation in surfaces. This paper explores the potential for modern techniques from spatial statistics to act as generative models for geometric variation in 3D-printed stainless steel. The complex macro-scale geometries of 3D-printed components pose a challenge that is not present in traditional surface metrology, as the training data and test data need not be defined on the same manifold. Strikingly, a covariance function defined in terms of geodesic distance on one manifold can fail to satisfy positive-definiteness and thus fail to be a valid covariance function in the context of a different manifold; this hinders the use of standard techniques that aim to learn a covariance function from a training dataset. On the other hand, the associated covariance differential operators are locally defined. This paper proposes to perform inference for such differential operators, facilitating generalisation from the manifold of a training dataset to the manifold of a test dataset. The approach is assessed in the context of model selection and explored in detail in the context of a finite element model for 3D-printed stainless steel.
△ Less
Submitted 3 October, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
GaussED: A Probabilistic Programming Language for Sequential Experimental Design
Authors:
Matthew A. Fisher,
Onur Teymur,
Chris. J. Oates
Abstract:
Sequential algorithms are popular for experimental design, enabling emulation, optimisation and inference to be efficiently performed. For most of these applications bespoke software has been developed, but the approach is general and many of the actual computations performed in such software are identical. Motivated by the diverse problems that can in principle be solved with common code, this pa…
▽ More
Sequential algorithms are popular for experimental design, enabling emulation, optimisation and inference to be efficiently performed. For most of these applications bespoke software has been developed, but the approach is general and many of the actual computations performed in such software are identical. Motivated by the diverse problems that can in principle be solved with common code, this paper presents GaussED, a simple probabilistic programming language coupled to a powerful experimental design engine, which together automate sequential experimental design for approximating a (possibly nonlinear) quantity of interest in Gaussian processes models. Using a handful of commands, GaussED can be used to: solve linear partial differential equations, perform tomographic reconstruction from integral data and implement Bayesian optimisation with gradient data.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Minimum Discrepancy Methods in Uncertainty Quantification
Authors:
Chris J. Oates
Abstract:
The lectures were prepared for the École Thématique sur les Incertitudes en Calcul Scientifique (ETICS) in September 2021.
The lectures were prepared for the École Thématique sur les Incertitudes en Calcul Scientifique (ETICS) in September 2021.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Black Box Probabilistic Numerics
Authors:
Onur Teymur,
Christopher N. Foley,
Philip G. Breen,
Toni Karvonen,
Chris. J. Oates
Abstract:
Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, renderin…
▽ More
Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, rendering the proper conditioning of random variables difficult and limiting the range of numerical tasks that can be addressed. Instead, this paper proposes to construct probabilistic numerical methods based only on the final output from a traditional method. A convergent sequence of approximations to the quantity of interest constitute a dataset, from which the limiting quantity of interest can be extrapolated, in a probabilistic analogue of Richardson's deferred approach to the limit. This black box approach (1) massively expands the range of tasks to which probabilistic numerics can be applied, (2) inherits the features and performance of state-of-the-art numerical methods, and (3) enables provably higher orders of convergence to be achieved. Applications are presented for nonlinear ordinary and partial differential equations, as well as for eigenvalue problems-a setting for which no probabilistic numerical methods have yet been developed.
△ Less
Submitted 28 October, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Stein's Method Meets Computational Statistics: A Review of Some Recent Developments
Authors:
Andreas Anastasiou,
Alessandro Barp,
François-Xavier Briol,
Bruno Ebner,
Robert E. Gaunt,
Fatemeh Ghaderinezhad,
Jackson Gorham,
Arthur Gretton,
Christophe Ley,
Qiang Liu,
Lester Mackey,
Chris. J. Oates,
Gesine Reinert,
Yvik Swan
Abstract:
Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stim…
▽ More
Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stimulate further research into the successful field of Stein's method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing.
△ Less
Submitted 22 June, 2022; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Bayesian Numerical Methods for Nonlinear Partial Differential Equations
Authors:
Junyang Wang,
Jon Cockayne,
Oksana Chkrebtii,
T. J. Sullivan,
Chris. J. Oates
Abstract:
The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial…
▽ More
The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost. The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator. Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated. A suitable prior model for the solution of the PDE is identified using novel theoretical analysis of the sample path properties of Matérn processes, which may be of independent interest.
△ Less
Submitted 3 May, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Robust Generalised Bayesian Inference for Intractable Likelihoods
Authors:
Takuo Matsubara,
Jeremias Knoblauch,
François-Xavier Briol,
Chris. J. Oates
Abstract:
Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In thi…
▽ More
Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In this context, the Stein discrepancy circumvents evaluation of the normalisation constant and produces generalised posteriors that are either closed form or accessible using standard Markov chain Monte Carlo. On a theoretical level, we show consistency, asymptotic normality, and bias-robustness of the generalised posterior, highlighting how these properties are impacted by the choice of Stein discrepancy. Then, we provide numerical experiments on a range of intractable distributions, including applications to kernel-based exponential family models and non-Gaussian graphical models.
△ Less
Submitted 11 January, 2022; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Post-Processing of MCMC
Authors:
Leah F. South,
Marina Riabiz,
Onur Teymur,
Chris. J. Oates
Abstract:
Markov chain Monte Carlo (MCMC) is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a lim…
▽ More
Markov chain Monte Carlo (MCMC) is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this article is to review state-of-the-art techniques for post-processing Markov chain output. Our review covers methods based on discrepancy minimisation, which directly address the bias-variance trade-off, as well as general-purpose control variate methods for approximating expected quantities of interest.
△ Less
Submitted 6 September, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
Probabilistic Iterative Methods for Linear Systems
Authors:
Jon Cockayne,
Ilse C. F. Ipsen,
Chris J. Oates,
Tim W. Reid
Abstract:
This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a seq…
▽ More
This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a sequence $\mathbf{x}_m$ of approximations that converge to $\mathbf{x}_*$. The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $μ_m \in \mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best guess" for $\mathbf{x}_*$, for example as the mean of $μ_m$, and also probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when it has not been exactly determined. Theoretical analysis is provided in the prototypical case of a stationary linear iterative method. In this setting we characterise both the rate of contraction of $μ_m$ to an atomic measure on $\mathbf{x}_*$ and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the insight into solution uncertainty that can be provided by probabilistic iterative methods.
△ Less
Submitted 11 January, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Measure Transport with Kernel Stein Discrepancy
Authors:
Matthew A. Fisher,
Tui Nolan,
Matthew M. Graham,
Dennis Prangle,
Chris J. Oates
Abstract:
Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the Kullback--Leibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose t…
▽ More
Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the Kullback--Leibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose to minimise a kernel Stein discrepancy (KSD) instead, requiring only that the set of transport maps is dense in an $L^2$ sense and demonstrating how this condition can be validated. The consistency of the associated posterior approximation is established and empirical results suggest that KSD is competitive and more flexible alternative to KLD for measure transport.
△ Less
Submitted 26 October, 2020; v1 submitted 22 October, 2020;
originally announced October 2020.
-
The Ridgelet Prior: A Covariance Function Approach to Prior Specification for Bayesian Neural Networks
Authors:
Takuo Matsubara,
Chris J. Oates,
François-Xavier Briol
Abstract:
Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is propose…
▽ More
Bayesian neural networks attempt to combine the strong predictive performance of neural networks with formal quantification of uncertainty associated with the predictive output in the Bayesian framework. However, it remains unclear how to endow the parameters of the network with a prior distribution that is meaningful when lifted into the output space of the network. A possible solution is proposed that enables the user to posit an appropriate Gaussian process covariance function for the task at hand. Our approach constructs a prior distribution for the parameters of the network, called a ridgelet prior, that approximates the posited Gaussian process in the output space of the network. In contrast to existing work on the connection between neural networks and Gaussian processes, our analysis is non-asymptotic, with finite sample-size error bounds provided. This establishes the universality property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular. Our experimental assessment is limited to a proof-of-concept, where we demonstrate that the ridgelet prior can out-perform an unstructured prior on regression problems for which a suitable Gaussian process prior can be provided.
△ Less
Submitted 11 January, 2022; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Optimal quantisation of probability measures using maximum mean discrepancy
Authors:
Onur Teymur,
Jackson Gorham,
Marina Riabiz,
Chris. J. Oates
Abstract:
Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. We consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce com…
▽ More
Several researchers have proposed minimisation of maximum mean discrepancy (MMD) as a method to quantise probability measures, i.e., to approximate a target distribution by a representative point set. We consider sequential algorithms that greedily minimise MMD over a discrete candidate set. We propose a novel non-myopic algorithm and, in order to both improve statistical efficiency and reduce computational cost, we investigate a variant that applies this technique to a mini-batch of the candidate set at each iteration. When the candidate points are sampled from the target, the consistency of these new algorithm - and their mini-batch variants - is established. We demonstrate the algorithms on a range of important computational problems, including optimisation of nodes in Bayesian cubature and the thinning of Markov chain output.
△ Less
Submitted 12 February, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization
Authors:
Shijing Si,
Chris. J. Oates,
Andrew B. Duncan,
Lawrence Carin,
François-Xavier Briol
Abstract:
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that…
▽ More
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Novel theoretical results are presented to provide insight into the variance reduction that can be achieved, and an empirical assessment, including applications to Bayesian inference, is provided in support.
△ Less
Submitted 21 July, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Optimal Thinning of MCMC Output
Authors:
Marina Riabiz,
Wilson Chen,
Jon Cockayne,
Pawel Swietach,
Steven A. Niederer,
Lester Mackey,
Chris. J. Oates
Abstract:
The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively se…
▽ More
The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB.
△ Less
Submitted 11 January, 2022; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Semi-Exact Control Functionals From Sard's Method
Authors:
Leah F. South,
Toni Karvonen,
Chris Nemeth,
Mark Girolami,
Chris. J. Oates
Abstract:
The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estim…
▽ More
The numerical approximation of posterior expected quantities of interest is considered. A novel control variate technique is proposed for post-processing of Markov chain Monte Carlo output, based both on Stein's method and an approach to numerical integration due to Sard. The resulting estimators are proven to be polynomially exact in the Gaussian context, while empirical results suggest the estimators approximate a Gaussian cubature method near the Bernstein-von-Mises limit. The main theoretical result establishes a bias-correction property in settings where the Markov chain does not leave the posterior invariant. Empirical results are presented across a selection of Bayesian inference tasks. All methods used in this paper are available in the R package ZVCV.
△ Less
Submitted 6 May, 2021; v1 submitted 31 January, 2020;
originally announced February 2020.
-
Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions
Authors:
Toni Karvonen,
George Wynne,
Filip Tronarp,
Chris J. Oates,
Simo Särkkä
Abstract:
Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the sc…
▽ More
Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Matérn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings.
△ Less
Submitted 11 May, 2020; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Discussion of "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé
Authors:
Leah F. South,
Chris Nemeth,
Chris J. Oates
Abstract:
This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B.
This is a contribution for the discussion on "Unbiased Markov chain Monte Carlo with couplings" by Pierre E. Jacob, John O'Leary and Yves F. Atchadé to appear in the Journal of the Royal Statistical Society Series B.
△ Less
Submitted 20 January, 2020; v1 submitted 22 December, 2019;
originally announced December 2019.
-
A Locally Adaptive Bayesian Cubature Method
Authors:
Matthew A Fisher,
Chris J Oates,
Catherine Powell,
Aretha Teckentrup
Abstract:
Bayesian cubature (BC) is a popular inferential perspective on the cubature of expensive integrands, wherein the integrand is emulated using a stochastic process model. Several approaches have been put forward to encode sequential adaptation (i.e. dependence on previous integrand evaluations) into this framework. However, these proposals have been limited to either estimating the parameters of a s…
▽ More
Bayesian cubature (BC) is a popular inferential perspective on the cubature of expensive integrands, wherein the integrand is emulated using a stochastic process model. Several approaches have been put forward to encode sequential adaptation (i.e. dependence on previous integrand evaluations) into this framework. However, these proposals have been limited to either estimating the parameters of a stationary covariance model or focusing computational resources on regions where large values are taken by the integrand. In contrast, many classical adaptive cubature methods focus computational resources on spatial regions in which local error estimates are largest. The contributions of this work are three-fold: First, we present a theoretical result that suggests there does not exist a direct Bayesian analogue of the classical adaptive trapezoidal method. Then we put forward a novel BC method that has empirically similar behaviour to the adaptive trapezoidal method. Finally we present evidence that the novel method provides improved cubature performance, relative to standard BC, in a detailed empirical assessment.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
A Role for Symmetry in the Bayesian Solution of Differential Equations
Authors:
Junyang Wang,
Jon Cockayne,
Chris J. Oates
Abstract:
The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators suggests that formal uncertainty quantification can also be performed in this context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayesian PNM have the ap…
▽ More
The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators suggests that formal uncertainty quantification can also be performed in this context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayesian PNM have the appealing property of being closed under composition, such that uncertainty due to different sources of discretisation in a numerical method can be jointly modelled and rigorously propagated. Despite recent attention, no exact Bayesian PNM for the numerical solution of ordinary differential equations (ODEs) has been proposed. This raises the fundamental question of whether exact Bayesian methods for (in general nonlinear) ODEs even exist. The purpose of this paper is to provide a positive answer for a limited class of ODE. To this end, we work at a foundational level, where a novel Bayesian PNM is proposed as a proof-of-concept. Our proposal is a synthesis of classical Lie group methods, to exploit underlying symmetries in the gradient field, and non-parametric regression in a transformed solution space for the ODE. The procedure is presented in detail for first and second order ODEs and relies on a certain strong technical condition -- existence of a solvable Lie algebra -- being satisfied. Numerical illustrations are provided.
△ Less
Submitted 23 September, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Stein Point Markov Chain Monte Carlo
Authors:
Wilson Ye Chen,
Alessandro Barp,
François-Xavier Briol,
Jackson Gorham,
Mark Girolami,
Lester Mackey,
Chris. J. Oates
Abstract:
An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea…
▽ More
An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established.
△ Less
Submitted 14 September, 2020; v1 submitted 9 May, 2019;
originally announced May 2019.
-
A Modern Retrospective on Probabilistic Numerics
Authors:
C. J. Oates,
T. J. Sullivan
Abstract:
This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a…
▽ More
This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a degree of maturity in the intervening period, mediated by paradigms such as average-case analysis and information-based complexity. We provide a subjective assessment of the state of research in probabilistic numerics and highlight some difficulties to be addressed by future works.
△ Less
Submitted 5 May, 2019; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Optimality Criteria for Probabilistic Numerical Methods
Authors:
Chris. J. Oates,
Jon Cockayne,
Dennis Prangle,
T. J. Sullivan,
Mark Girolami
Abstract:
It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that standard approaches from the decision-theoretic framework are neither appropriate nor sufficient. Instead, we consider a particular optimality criterion from Bayesian experimental…
▽ More
It is well understood that Bayesian decision theory and average case analysis are essentially identical. However, if one is interested in performing uncertainty quantification for a numerical task, it can be argued that standard approaches from the decision-theoretic framework are neither appropriate nor sufficient. Instead, we consider a particular optimality criterion from Bayesian experimental design and study its implied optimal information in the numerical context. This information is demonstrated to differ, in general, from the information that would be used in an average-case-optimal numerical method. The explicit connection to Bayesian experimental design suggests several distinct regimes in which optimal probabilistic numerical methods can be developed.
△ Less
Submitted 10 May, 2019; v1 submitted 14 January, 2019;
originally announced January 2019.
-
Improved Calibration of Numerical Integration Error in Sigma-Point Filters
Authors:
Jakub Prüher,
Toni Karvonen,
Chris J. Oates,
Ondřej Straka,
Simo Särkkä
Abstract:
The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justifie…
▽ More
The sigma-point filters, such as the UKF, which exploit numerical quadrature to obtain an additional order of accuracy in the moment transformation step, are popular alternatives to the ubiquitous EKF. The classical quadrature rules used in the sigma-point filters are motivated via polynomial approximation of the integrand, however in the applied context these assumptions cannot always be justified. As a result, quadrature error can introduce bias into estimated moments, for which there is no compensatory mechanism in the classical sigma-point filters. This can lead in turn to estimates and predictions that are poorly calibrated. In this article, we investigate the Bayes-Sard quadrature method in the context of sigma-point filters, which enables uncertainty due to quadrature error to be formalised within a probabilistic model. Our first contribution is to derive the well-known classical quadratures as special cases of the Bayes-Sard quadrature method. Then a general-purpose moment transform is developed and utilised in the design of novel sigma-point filters, so that uncertainty due to quadrature error is explicitly quantified. Numerical experiments on a challenging tracking example with misspecified initial conditions show that the additional uncertainty quantification built into our method leads to better-calibrated state estimates with improved RMSE.
△ Less
Submitted 22 February, 2020; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"
Authors:
Francois-Xavier Briol,
Chris J. Oates,
Mark Girolami,
Michael A. Osborne,
Dino Sejdinovic
Abstract:
This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme…
▽ More
This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation?
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Regularized Zero-Variance Control Variates
Authors:
Leah F. South,
Chris J. Oates,
Antonietta Mira,
Christopher Drovandi
Abstract:
Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the…
▽ More
Zero-variance control variates (ZV-CV) are a post-processing method to reduce the variance of Monte Carlo estimators of expectations using the derivatives of the log target. Once the derivatives are available, the only additional computational effort lies in solving a linear regression problem. Significant variance reductions have been achieved with this method in low dimensional examples, but the number of covariates in the regression rapidly increases with the dimension of the target. In this paper, we present compelling empirical evidence that the use of penalized regression techniques in the selection of high-dimensional control variates provides performance gains over the classical least squares method. Another type of regularization based on using subsets of derivatives, or a priori regularization as we refer to it in this paper, is also proposed to reduce computational and storage requirements. Several examples showing the utility and limitations of regularized ZV-CV for Bayesian inference are given. The methods proposed in this paper are accessible through the R package ZVCV.
△ Less
Submitted 15 August, 2022; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Symmetry Exploits for Bayesian Cubature Methods
Authors:
Toni Karvonen,
Simo Särkkä,
Chris. J. Oates
Abstract:
Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in or…
▽ More
Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce - in some cases substantially - the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering non-symmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes-Sard cubature method and the multi-output Bayesian cubature method.
△ Less
Submitted 26 January, 2019; v1 submitted 26 September, 2018;
originally announced September 2018.
-
On the Bayesian Solution of Differential Equations
Authors:
Junyang Wang,
Jon Cockayne,
Chris Oates
Abstract:
The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators allows for formal statistical quantification of the error due to discretisation in the numerical context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayes…
▽ More
The interpretation of numerical methods, such as finite difference methods for differential equations, as point estimators allows for formal statistical quantification of the error due to discretisation in the numerical context. Competing statistical paradigms can be considered and Bayesian probabilistic numerical methods (PNMs) are obtained when Bayesian statistical principles are deployed. Bayesian PNM are closed under composition, such that uncertainty due to different sources of discretisation can be jointly modelled and rigorously propagated. However, we argue that no strictly Bayesian PNM for the numerical solution of ordinary differential equations (ODEs) have yet been developed. To address this gap, we work at a foundational level, where a novel Bayesian PNM is proposed as a proof-of-concept. Our proposal is a synthesis of classical Lie group methods, to exploit the underlying structure of the gradient field, and non-parametric regression in a transformed solution space for the ODE. The procedure is presented in detail for first order ODEs and relies on a certain technical condition -- existence of a solvable Lie algebra -- being satisfied. Numerical illustrations are provided.
△ Less
Submitted 22 May, 2018; v1 submitted 18 May, 2018;
originally announced May 2018.
-
A Bayes-Sard Cubature Method
Authors:
Toni Karvonen,
Chris J. Oates,
Simo Särkkä
Abstract:
This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dime…
▽ More
This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dimensional. To address these drawbacks we introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established. This is achieved by considering a Gaussian process model for the integrand whose mean is a parametric regression model, with an improper flat prior on each regression coefficient. The features in the regression model consist of test functions which are guaranteed to be exactly integrated, with remaining degrees of freedom afforded to the non-parametric part. The asymptotic convergence of the Bayes-Sard cubature method is established and the theoretical results are numerically verified. In particular, we report two orders of magnitude reduction in error compared to Bayesian cubature in the context of a high-dimensional financial integral.
△ Less
Submitted 18 May, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
Stein Points
Authors:
Wilson Ye Chen,
Lester Mackey,
Jackson Gorham,
François-Xavier Briol,
Chris J. Oates
Abstract:
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein P…
▽ More
An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present `Stein Points'. The idea is to exploit either a greedy or a conditional gradient method to iteratively minimise a kernel Stein discrepancy between the empirical measure and $p(x)$. Our empirical results demonstrate that Stein Points enable accurate approximation of the posterior at modest computational cost. In addition, theoretical results are provided to establish convergence of the method.
△ Less
Submitted 19 June, 2018; v1 submitted 27 March, 2018;
originally announced March 2018.
-
A Bayesian Conjugate Gradient Method
Authors:
Jon Cockayne,
Chris Oates,
Ilse Ipsen,
Mark Girolami
Abstract:
A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this cas…
▽ More
A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about the numerical error. In this paper we propose a novel statistical model for this numerical error set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging.
△ Less
Submitted 17 December, 2018; v1 submitted 16 January, 2018;
originally announced January 2018.
-
Posterior Integration on a Riemannian Manifold
Authors:
Chris. J. Oates,
Alessandro Barp,
Mark Girolami
Abstract:
The geodesic Markov chain Monte Carlo method and its variants enable computation of integrals with respect to a posterior supported on a manifold. However, for regular integrals, the convergence rate of the ergodic average will be sub-optimal. To fill this gap, this paper extends the efficient posterior integration method of Oates et al. (2017) to the case of a Riemannian manifold. In contrast to…
▽ More
The geodesic Markov chain Monte Carlo method and its variants enable computation of integrals with respect to a posterior supported on a manifold. However, for regular integrals, the convergence rate of the ergodic average will be sub-optimal. To fill this gap, this paper extends the efficient posterior integration method of Oates et al. (2017) to the case of a Riemannian manifold. In contrast to the original Euclidean case, no non-trivial boundary conditions are needed for a closed manifold. The method is assessed through simulation and deployed to compute posterior integrals for an Australian Mesozoic paleomagnetic pole model, whose parameters are constrained to lie on the manifold $M = \mathbb{S}^2 \times \mathbb{R}_+$.
△ Less
Submitted 14 October, 2018; v1 submitted 5 December, 2017;
originally announced December 2017.
-
Bayesian Probabilistic Numerical Methods in Time-Dependent State Estimation for Industrial Hydrocyclone Equipment
Authors:
Chris. J. Oates,
Jon Cockayne,
Robert G. Aykroyd,
Mark Girolami
Abstract:
The use of high-power industrial equipment, such as large-scale mixing equipment or a hydrocyclone for separation of particles in liquid suspension, demands careful monitoring to ensure correct operation. The fundamental task of state-estimation for the liquid suspension can be posed as a time-evolving inverse problem and solved with Bayesian statistical methods. In this paper, we extend Bayesian…
▽ More
The use of high-power industrial equipment, such as large-scale mixing equipment or a hydrocyclone for separation of particles in liquid suspension, demands careful monitoring to ensure correct operation. The fundamental task of state-estimation for the liquid suspension can be posed as a time-evolving inverse problem and solved with Bayesian statistical methods. In this paper, we extend Bayesian methods to incorporate statistical models for the error that is incurred in the numerical solution of the physical governing equations. This enables full uncertainty quantification within a principled computation-precision trade-off, in contrast to the over-confident inferences that are obtained when all sources of numerical error are ignored. The method is cast within a sequential Monte Carlo framework and an optimised implementation is provided in Python.
△ Less
Submitted 19 December, 2018; v1 submitted 19 July, 2017;
originally announced July 2017.
-
Optimal Monte Carlo integration on closed manifolds
Authors:
Martin Ehler,
Manuel Graef,
Chris. J. Oates
Abstract:
The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $n^{-1/2}$. However, re-weighting of random points can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal appr…
▽ More
The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $n^{-1/2}$. However, re-weighting of random points can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal approximation rates up to a logarithmic factor. We also provide numerical experiments matching the theoretical results for some Sobolev spaces on the unit sphere and on the Grassmannian manifold. Our theoretical findings also cover function spaces on more general sets such as the unit ball, the cube, and the simplex.
△ Less
Submitted 24 January, 2018; v1 submitted 15 July, 2017;
originally announced July 2017.