Search | arXiv e-print repository

Probabilistic Richardson Extrapolation

Authors: Chris. J. Oates, Toni Karvonen, Aretha L. Teckentrup, Marina Strocchi, Steven A. Niederer

Abstract: For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that un… ▽ More For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretised and convergence orders are not easily analysed. To address this challenge we present a probabilistic perspective on Richardson extrapolation, a point of view that unifies classical extrapolation methods with modern multi-fidelity modelling, and handles uncertain convergence orders by allowing these to be statistically estimated. The approach is developed using Gaussian processes, leading to Gauss-Richardson Extrapolation (GRE). Conditions are established under which extrapolation using the conditional mean achieves a polynomial (or even an exponential) speed-up compared to the original numerical method. Further, the probabilistic formulation unlocks the possibility of experimental design, casting the selection of fidelities as a continuous optimisation problem which can then be (approximately) solved. A case-study involving a computational cardiac model demonstrates that practical gains in accuracy can be achieved using the GRE method. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2303.02759 [pdf, ps, other]

The Matérn Model: A Journey through Statistics, Numerical Analysis and Machine Learning

Authors: Emilio Porcu, Moreno Bevilacqua, Robert Schaback, Chris J. Oates

Abstract: The Matérn model has been a cornerstone of spatial statistics for more than half a century. More recently, the Matérn model has been central to disciplines as diverse as numerical analysis, approximation theory, computational statistics, machine learning, and probability theory. In this article we take a Matérn-based journey across these disciplines. First, we reflect on the importance of the Maté… ▽ More The Matérn model has been a cornerstone of spatial statistics for more than half a century. More recently, the Matérn model has been central to disciplines as diverse as numerical analysis, approximation theory, computational statistics, machine learning, and probability theory. In this article we take a Matérn-based journey across these disciplines. First, we reflect on the importance of the Matérn model for estimation and prediction in spatial statistics, establishing also connections to other disciplines in which the Matérn model has been influential. Then, we position the Matérn model within the literature on big data and scalable computation: the SPDE approach, the Vecchia likelihood approximation, and recent applications in Bayesian computation are all discussed. Finally, we review recent devlopments, including flexible alternatives to the Matérn model, whose performance we compare in terms of estimation, prediction, screening effect, computation, and Sobolev regularity properties. △ Less

Submitted 5 March, 2023; originally announced March 2023.

arXiv:2210.16357 [pdf, other]

Minimum Kernel Discrepancy Estimators

Authors: Chris. J. Oates

Abstract: For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, who… ▽ More For two decades, reproducing kernels and their associated discrepancies have facilitated elegant theoretical analyses in the setting of quasi Monte Carlo. These same tools are now receiving interest in statistics and related fields, as criteria that can be used to select an appropriate statistical model for a given dataset. The focus of this article is on minimum kernel discrepancy estimators, whose use in statistical applications is reviewed, and a general theoretical framework for establishing their asymptotic properties is presented. △ Less

Submitted 23 August, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: To appear in: A. Hinrichs, P. Kritzer, F. Pillichshammer (eds.). Monte Carlo and Quasi-Monte Carlo Methods 2022. Springer Verlag

arXiv:2208.03885 [pdf, other]

Statistical Properties of the Probabilistic Numeric Linear Solver BayesCG

Authors: Tim W. Reid, Ilse C. F. Ipsen, Jon Cockayne, Chris J. Oates

Abstract: We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose in… ▽ More We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose instead two test statistics that are necessary but not sufficient for calibration: the Z-statistic and the new S-statistic. We show analytically and experimentally that under low-rank approximate Krylov posteriors, BayesCG exhibits desirable properties of a calibrated solver, is only slightly optimistic, and is computationally competitive with CG. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: 40 Pages

MSC Class: 65F10; 62F15; 15A06

arXiv:2207.02636 [pdf, other]

Gradient-Free Kernel Stein Discrepancy

Authors: Matthew A Fisher, Chris. J Oates

Abstract: Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical mo… ▽ More Stein discrepancies have emerged as a powerful statistical tool, being applied to fundamental statistical problems including parameter inference, goodness-of-fit testing, and sampling. The canonical Stein discrepancies require the derivatives of a statistical model to be computed, and in return provide theoretical guarantees of convergence detection and control. However, for complex statistical models, the stable numerical computation of derivatives can require bespoke algorithmic development and render Stein discrepancies impractical. This paper focuses on posterior approximation using Stein discrepancies, and introduces a collection of non-canonical Stein discrepancies that are gradient free, meaning that derivatives of the statistical model are not required. Sufficient conditions for convergence detection and control are established, and applications to sampling and variational inference are presented. △ Less

Submitted 18 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2206.08420 [pdf, ps, other]

Generalised Bayesian Inference for Discrete Intractable Likelihood

Authors: Takuo Matsubara, Jeremias Knoblauch, François-Xavier Briol, Chris. J. Oates

Abstract: Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspir… ▽ More Discrete state spaces represent a major computational challenge to statistical inference, since the computation of normalisation constants requires summation over large or possibly infinite sets, which can be impractical. This paper addresses this computational challenge through the development of a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood. Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence, in lieu of the problematic intractable likelihood. The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov chain Monte Carlo, circumventing the intractable normalising constant. The statistical properties of the generalised posterior are analysed, with sufficient conditions for posterior consistency and asymptotic normality established. In addition, a novel and general approach to calibration of generalised posteriors is proposed. Applications are presented on lattice models for discrete spatial data and on multivariate models for count data, where in each case the methodology facilitates generalised Bayesian inference at low computational cost. △ Less

Submitted 1 September, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2203.09179 [pdf, ps, other]

Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed

Authors: Toni Karvonen, Chris J. Oates

Abstract: Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression… ▽ More Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model. △ Less

Submitted 25 April, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: An important work is missing from our literature review. Ben Salem, Bachoc, Roustant, Gamboa and Tomaso [Gaussian process-based dimension reduction for goal-oriented sequential design. SIAM/ASA Journal on Uncertainty Quantification, 7(4):1369-1397, 2019. See Proposition 4.3.] have proved parts of Theorems 2.3 and 5.3 using a technique that is more or less identical to the proof in Section 7.4

Journal ref: Journal of Machine Learning Research, 24(120):1-47, 2023

arXiv:2106.13718 [pdf, other]

Black Box Probabilistic Numerics

Authors: Onur Teymur, Christopher N. Foley, Philip G. Breen, Toni Karvonen, Chris. J. Oates

Abstract: Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, renderin… ▽ More Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, rendering the proper conditioning of random variables difficult and limiting the range of numerical tasks that can be addressed. Instead, this paper proposes to construct probabilistic numerical methods based only on the final output from a traditional method. A convergent sequence of approximations to the quantity of interest constitute a dataset, from which the limiting quantity of interest can be extrapolated, in a probabilistic analogue of Richardson's deferred approach to the limit. This black box approach (1) massively expands the range of tasks to which probabilistic numerics can be applied, (2) inherits the features and performance of state-of-the-art numerical methods, and (3) enables provably higher orders of convergence to be achieved. Applications are presented for nonlinear ordinary and partial differential equations, as well as for eigenvalue problems-a setting for which no probabilistic numerical methods have yet been developed. △ Less

Submitted 28 October, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

Journal ref: Advances in Neural Information Processing Systems 34 (2021)

arXiv:2105.03481 [pdf, other]

Stein's Method Meets Computational Statistics: A Review of Some Recent Developments

Authors: Andreas Anastasiou, Alessandro Barp, François-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, Arthur Gretton, Christophe Ley, Qiang Liu, Lester Mackey, Chris. J. Oates, Gesine Reinert, Yvik Swan

Abstract: Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stim… ▽ More Stein's method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein's method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stimulate further research into the successful field of Stein's method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing. △ Less

Submitted 22 June, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

Comments: Accepted for publication by "Statistical Science"

arXiv:2104.12587 [pdf, other]

doi 10.1007/s11222-021-10030-w

Bayesian Numerical Methods for Nonlinear Partial Differential Equations

Authors: Junyang Wang, Jon Cockayne, Oksana Chkrebtii, T. J. Sullivan, Chris. J. Oates

Abstract: The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial… ▽ More The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied. However, nonlinear partial differential equations (PDEs) pose substantial challenges from an inferential perspective, most notably the absence of explicit conditioning formula. This paper extends earlier work on linear PDEs to a general class of initial value problems specified by nonlinear PDEs, motivated by problems for which evaluations of the right-hand-side, initial conditions, or boundary conditions of the PDE have a high computational cost. The proposed method can be viewed as exact Bayesian inference under an approximate likelihood, which is based on discretisation of the nonlinear differential operator. Proof-of-concept experimental results demonstrate that meaningful probabilistic uncertainty quantification for the unknown solution of the PDE can be performed, while controlling the number of times the right-hand-side, initial and boundary conditions are evaluated. A suitable prior model for the solution of the PDE is identified using novel theoretical analysis of the sample path properties of Matérn processes, which may be of independent interest. △ Less

Submitted 3 May, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Journal ref: Stat. Comput. 31(5):no. 55, 20pp., 2021

arXiv:2104.07359 [pdf, other]

Robust Generalised Bayesian Inference for Intractable Likelihoods

Authors: Takuo Matsubara, Jeremias Knoblauch, François-Xavier Briol, Chris. J. Oates

Abstract: Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In thi… ▽ More Generalised Bayesian inference updates prior beliefs using a loss function, rather than a likelihood, and can therefore be used to confer robustness against possible mis-specification of the likelihood. Here we consider generalised Bayesian inference with a Stein discrepancy as a loss function, motivated by applications in which the likelihood contains an intractable normalisation constant. In this context, the Stein discrepancy circumvents evaluation of the normalisation constant and produces generalised posteriors that are either closed form or accessible using standard Markov chain Monte Carlo. On a theoretical level, we show consistency, asymptotic normality, and bias-robustness of the generalised posterior, highlighting how these properties are impacted by the choice of Stein discrepancy. Then, we provide numerical experiments on a range of intractable distributions, including applications to kernel-based exponential family models and non-Gaussian graphical models. △ Less

Submitted 11 January, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

arXiv:2012.12670 [pdf, other]

Testing whether a Learning Procedure is Calibrated

Authors: Jon Cockayne, Matthew M. Graham, Chris J. Oates, T. J. Sullivan, Onur Teymur

Abstract: A learning procedure takes as input a dataset and performs inference for the parameters $θ$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $θ$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning pro… ▽ More A learning procedure takes as input a dataset and performs inference for the parameters $θ$ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about $θ$ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true data-generating parameters are plausible as samples from its distributional output. A learning procedure whose inferences and predictions are systematically over- or under-confident will fail to be calibrated. On the other hand, a learning procedure that is calibrated need not be statistically efficient. A hypothesis-testing framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Several vignettes are presented to illustrate different aspects of the framework. △ Less

Submitted 16 June, 2022; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2012.12615 [pdf, other]

Probabilistic Iterative Methods for Linear Systems

Authors: Jon Cockayne, Ilse C. F. Ipsen, Chris J. Oates, Tim W. Reid

Abstract: This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a seq… ▽ More This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a sequence $\mathbf{x}_m$ of approximations that converge to $\mathbf{x}_*$. The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $μ_m \in \mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best guess" for $\mathbf{x}_*$, for example as the mean of $μ_m$, and also probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when it has not been exactly determined. Theoretical analysis is provided in the prototypical case of a stationary linear iterative method. In this setting we characterise both the rate of contraction of $μ_m$ to an atomic measure on $\mathbf{x}_*$ and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the insight into solution uncertainty that can be provided by probabilistic iterative methods. △ Less

Submitted 11 January, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2008.03225 [pdf, other]

BayesCG As An Uncertainty Aware Version of CG

Authors: Tim W. Reid, Ilse C. F. Ipsen, Jon Cockayne, Chris J. Oates

Abstract: The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covaria… ▽ More The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covariances that can be propagated to subsequent computations. The covariances have low-rank and are maintained in factored form. This allows easy generation of accurate samples to probe uncertainty in downstream computations. Numerical experiments confirm the effectiveness of the low-rank posterior covariances. △ Less

Submitted 3 October, 2022; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: 34 Pages including supplementary material (main paper is 23 pages, supplement is 11 pages). Computer codes are available at https://github.com/treid5/ProbNumCG_Supp

MSC Class: 65F10; 62F15; 65F50; 15A06; 15A10

arXiv:2005.03952 [pdf, other]

Optimal Thinning of MCMC Output

Authors: Marina Riabiz, Wilson Chen, Jon Cockayne, Pawel Swietach, Steven A. Niederer, Lester Mackey, Chris. J. Oates

Abstract: The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively se… ▽ More The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo can be sub-optimal in terms of the empirical approximations that are produced. Typically a number of the initial states are attributed to "burn in" and removed, whilst the remainder of the chain is "thinned" if compression is also required. In this paper we consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel method is proposed, based on greedy minimisation of a kernel Stein discrepancy, that is suitable for problems where heavy compression is required. Theoretical results guarantee consistency of the method and its effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. Software is available in the Stein Thinning package in Python, R and MATLAB. △ Less

Submitted 11 January, 2022; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: To appear in the Journal of the Royal Statistical Society, Series B, 2022+

arXiv:2004.12654 [pdf, other]

Integration in reproducing kernel Hilbert spaces of Gaussian kernels

Authors: Toni Karvonen, Chris J. Oates, Mark Girolami

Abstract: The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of a… ▽ More The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm for efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of algorithms that use $N$ evaluations to integrate $d$-variate functions reproduced by Gaussian kernels and prove the exponential or super-algebraic decay of their worst-case errors. In contrast to earlier work, no constraints are placed on the length-scale parameter of the Gaussian kernel. The first class of algorithms is obtained via an appropriate scaling of the classical Gauss-Hermite rules. For these algorithms we derive lower and upper bounds on the worst-case error of the forms $\exp(-c_1 N^{1/d}) N^{1/(4d)}$ and $\exp(-c_2 N^{1/d}) N^{-1/(4d)}$, respectively, for positive constants $c_1 > c_2$. The second class of algorithms we construct is more flexible and uses worst-case optimal weights for points that may be taken as a nested sequence. For these algorithms we derive upper bounds of the form $\exp(-c_3 N^{1/(2d)})$ for a positive constant $c_3$. △ Less

Submitted 31 March, 2021; v1 submitted 27 April, 2020; originally announced April 2020.

Comments: Accepted for publication in Mathematics of Computation

arXiv:2001.10965 [pdf, other]

Maximum likelihood estimation and uncertainty quantification for Gaussian process approximation of deterministic functions

Authors: Toni Karvonen, George Wynne, Filip Tronarp, Chris J. Oates, Simo Särkkä

Abstract: Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the sc… ▽ More Despite the ubiquity of the Gaussian process regression model, few theoretical results are available that account for the fact that parameters of the covariance kernel typically need to be estimated from the dataset. This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset. Specifically, we consider the scenario where the scale parameter of a Sobolev kernel (such as a Matérn kernel) is estimated by maximum likelihood. We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model in the sense that the model can become "slowly" overconfident at worst, regardless of the difference between the smoothness of the data-generating function and that expected by the model. The analysis is based on a combination of techniques from nonparametric regression and scattered data interpolation. Empirical results are provided in support of the theoretical findings. △ Less

Submitted 11 May, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

arXiv:1905.03673 [pdf, other]

Stein Point Markov Chain Monte Carlo

Authors: Wilson Ye Chen, Alessandro Barp, François-Xavier Briol, Jackson Gorham, Mark Girolami, Lester Mackey, Chris. J. Oates

Abstract: An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain ea… ▽ More An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established. △ Less

Submitted 14 September, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: Minor bug fixed in Theorem 4 (result unchanged)

Journal ref: ICML 2019

arXiv:1901.04457 [pdf, other]

doi 10.1007/s11222-019-09902-z

A Modern Retrospective on Probabilistic Numerics

Authors: C. J. Oates, T. J. Sullivan

Abstract: This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a… ▽ More This article attempts to place the emergence of probabilistic numerics as a mathematical-statistical research field within its historical context and to explore how its gradual development can be related both to applications and to a modern formal treatment. We highlight in particular the parallel contributions of Sul'din and Larkin in the 1960s and how their pioneering early ideas have reached a degree of maturity in the intervening period, mediated by paradigms such as average-case analysis and information-based complexity. We provide a subjective assessment of the state of research in probabilistic numerics and highlight some difficulties to be addressed by future works. △ Less

Submitted 5 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: 23 pages, 2 figures

MSC Class: 62-03; 65-03; 01A60; 01A65; 01A67

Journal ref: Statistics and Computing 29(6):1335--1351, 2019

arXiv:1811.10275 [pdf, ps, other]

Rejoinder for "Probabilistic Integration: A Role in Statistical Computation?"

Authors: Francois-Xavier Briol, Chris J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comme… ▽ More This article is the rejoinder for the paper "Probabilistic Integration: A Role in Statistical Computation?" to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this paper, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation? △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Accepted to Statistical Science

arXiv:1810.04946 [pdf, other]

A Riemann-Stein Kernel Method

Authors: Alessandro Barp, Chris. J. Oates, Emilio Porcu, Mark Girolami

Abstract: This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is e… ▽ More This paper proposes and studies a numerical method for approximation of posterior expectations based on interpolation with a Stein reproducing kernel. Finite-sample-size bounds on the approximation error are established for posterior distributions supported on a compact Riemannian manifold, and we relate these to a kernel Stein discrepancy (KSD). Moreover, we prove in our setting that the KSD is equivalent to Sobolev discrepancy and, in doing so, we completely characterise the convergence-determining properties of KSD. Our contribution is rooted in a novel combination of Stein's method, the theory of reproducing kernels, and existence and regularity results for partial differential equations on a Riemannian manifold. △ Less

Submitted 11 January, 2022; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1809.10227 [pdf, other]

Symmetry Exploits for Bayesian Cubature Methods

Authors: Toni Karvonen, Simo Särkkä, Chris. J. Oates

Abstract: Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in or… ▽ More Bayesian cubature provides a flexible framework for numerical integration, in which a priori knowledge on the integrand can be encoded and exploited. This additional flexibility, compared to many classical cubature methods, comes at a computational cost which is cubic in the number of evaluations of the integrand. It has been recently observed that fully symmetric point sets can be exploited in order to reduce - in some cases substantially - the computational cost of the standard Bayesian cubature method. This work identifies several additional symmetry exploits within the Bayesian cubature framework. In particular, we go beyond earlier work in considering non-symmetric measures and, in addition to the standard Bayesian cubature method, present exploits for the Bayes-Sard cubature method and the multi-output Bayesian cubature method. △ Less

Submitted 26 January, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Accepted for publication in Statistics and Computing

arXiv:1808.09362 [pdf, other]

doi 10.1103/PhysRevE.99.042417

Embryonic lateral inhibition as optical modes: an analytical framework for mesoscopic pattern formation

Authors: Jose Negrete Jr, Andrew C. Oates

Abstract: Cellular checkerboard patterns are observed at many developmental stages of embryos. We study an analytically tractable model for lateral inhibition and show that a coupling coefficient with a negative value is sufficient to obtain noisy or periodic checkerboard patterns. We solve the case of a linear chain of cells explicitly and show that noisy anti-correlated patterns are available in a post-cr… ▽ More Cellular checkerboard patterns are observed at many developmental stages of embryos. We study an analytically tractable model for lateral inhibition and show that a coupling coefficient with a negative value is sufficient to obtain noisy or periodic checkerboard patterns. We solve the case of a linear chain of cells explicitly and show that noisy anti-correlated patterns are available in a post-critical regime $(ε_c < ε< 0)$. In the sub-critical regime $(-\infty < ε\leq ε_c)$ a periodic and alternating steady state is available, where pattern selection is determined by making an analogy with the optical modes of phonons. For cells arranged in a hexagonal lattice, the sub-critical pattern can be driven into three different states: two of those states are periodic checkerboards and a third in which both periodic states coexist. △ Less

Submitted 28 August, 2018; originally announced August 2018.

Journal ref: Phys. Rev. E 99, 042417 (2019)

arXiv:1804.03016 [pdf, other]

A Bayes-Sard Cubature Method

Authors: Toni Karvonen, Chris J. Oates, Simo Särkkä

Abstract: This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dime… ▽ More This paper focusses on the formulation of numerical integration as an inferential task. To date, research effort has largely focussed on the development of Bayesian cubature, whose distributional output provides uncertainty quantification for the integral. However, the point estimators associated to Bayesian cubature can be inaccurate and acutely sensitive to the prior when the domain is high-dimensional. To address these drawbacks we introduce Bayes-Sard cubature, a probabilistic framework that combines the flexibility of Bayesian cubature with the robustness of classical cubatures which are well-established. This is achieved by considering a Gaussian process model for the integrand whose mean is a parametric regression model, with an improper flat prior on each regression coefficient. The features in the regression model consist of test functions which are guaranteed to be exactly integrated, with remaining degrees of freedom afforded to the non-parametric part. The asymptotic convergence of the Bayes-Sard cubature method is established and the theoretical results are numerically verified. In particular, we report two orders of magnitude reduction in error compared to Bayesian cubature in the context of a high-dimensional financial integral. △ Less

Submitted 18 May, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

arXiv:1801.05242 [pdf, other]

A Bayesian Conjugate Gradient Method

Authors: Jon Cockayne, Chris Oates, Ilse Ipsen, Mark Girolami

Abstract: A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this cas… ▽ More A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about the numerical error. In this paper we propose a novel statistical model for this numerical error set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging. △ Less

Submitted 17 December, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

arXiv:1707.04723 [pdf, other]

Optimal Monte Carlo integration on closed manifolds

Authors: Martin Ehler, Manuel Graef, Chris. J. Oates

Abstract: The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $n^{-1/2}$. However, re-weighting of random points can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal appr… ▽ More The worst case integration error in reproducing kernel Hilbert spaces of standard Monte Carlo methods with n random points decays as $n^{-1/2}$. However, re-weighting of random points can sometimes be used to improve the convergence order. This paper contributes general theoretical results for Sobolev spaces on closed Riemannian manifolds, where we verify that such re-weighting yields optimal approximation rates up to a logarithmic factor. We also provide numerical experiments matching the theoretical results for some Sobolev spaces on the unit sphere and on the Grassmannian manifold. Our theoretical findings also cover function spaces on more general sets such as the unit ball, the cube, and the simplex. △ Less

Submitted 24 January, 2018; v1 submitted 15 July, 2017; originally announced July 2017.

MSC Class: 65C05; 65C60

arXiv:1706.03369 [pdf, other]

On the Sampling Problem for Kernel Quadrature

Authors: Francois-Xavier Briol, Chris J. Oates, Jon Cockayne, Wilson Ye Chen, Mark Girolami

Abstract: The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the… ▽ More The standard Kernel Quadrature method for numerical integration with random point sets (also called Bayesian Monte Carlo) is known to converge in root mean square error at a rate determined by the ratio $s/d$, where $s$ and $d$ encode the smoothness and dimension of the integrand. However, an empirical investigation reveals that the rate constant $C$ is highly sensitive to the distribution of the random points. In contrast to standard Monte Carlo integration, for which optimal importance sampling is well-understood, the sampling distribution that minimises $C$ for Kernel Quadrature does not admit a closed form. This paper argues that the practical choice of sampling distribution is an important open problem. One solution is considered; a novel automatic approach based on adaptive tempering and sequential Monte Carlo. Empirical results demonstrate a dramatic reduction in integration error of up to 4 orders of magnitude can be achieved with the proposed method. △ Less

Submitted 11 June, 2017; originally announced June 2017.

Comments: To appear at Thirty-fourth International Conference on Machine Learning (ICML 2017)

Journal ref: Proceedings of the 34th International Conference on Machine Learning, PMLR 70:586-595, 2017

arXiv:1702.03673 [pdf, other]

doi 10.1137/17M1139357

Bayesian Probabilistic Numerical Methods

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: The emergent field of probabilistic numerics has thus far lacked clear statistical principals. This paper establishes Bayesian probabilistic numerical methods as those which can be cast as solutions to certain inverse problems within the Bayesian framework. This allows us to establish general conditions under which Bayesian probabilistic numerical methods are well-defined, encompassing both non-li… ▽ More The emergent field of probabilistic numerics has thus far lacked clear statistical principals. This paper establishes Bayesian probabilistic numerical methods as those which can be cast as solutions to certain inverse problems within the Bayesian framework. This allows us to establish general conditions under which Bayesian probabilistic numerical methods are well-defined, encompassing both non-linear and non-Gaussian models. For general computation, a numerical approximation scheme is proposed and its asymptotic convergence established. The theoretical development is then extended to pipelines of computation, wherein probabilistic numerical methods are composed to solve more challenging numerical tasks. The contribution highlights an important research frontier at the interface of numerical analysis and uncertainty quantification, with a challenging industrial application presented. △ Less

Submitted 7 July, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

Journal ref: SIAM Review 61(4):756--789, 2019

arXiv:1701.04006 [pdf, other]

doi 10.1063/1.4985359

Probabilistic Numerical Methods for PDE-constrained Bayesian Inverse Problems

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: This paper develops meshless methods for probabilistically describing discretisation error in the numerical solution of partial differential equations. This construction enables the solution of Bayesian inverse problems while accounting for the impact of the discretisation of the forward problem. In particular, this drives statistical inferences to be more conservative in the presence of significa… ▽ More This paper develops meshless methods for probabilistically describing discretisation error in the numerical solution of partial differential equations. This construction enables the solution of Bayesian inverse problems while accounting for the impact of the discretisation of the forward problem. In particular, this drives statistical inferences to be more conservative in the presence of significant solver error. Theoretical results are presented describing rates of convergence for the posteriors in both the forward and inverse problems. This method is tested on a challenging inverse problem with a nonlinear forward model. △ Less

Submitted 15 January, 2017; originally announced January 2017.

arXiv:1605.07811 [pdf, other]

Probabilistic Numerical Methods for Partial Differential Equations and Bayesian Inverse Problems

Authors: Jon Cockayne, Chris Oates, Tim Sullivan, Mark Girolami

Abstract: This paper develops a probabilistic numerical method for solution of partial differential equations (PDEs) and studies application of that method to PDE-constrained inverse problems. This approach enables the solution of challenging inverse problems whilst accounting, in a statistically principled way, for the impact of discretisation error due to numerical solution of the PDE. In particular, the… ▽ More This paper develops a probabilistic numerical method for solution of partial differential equations (PDEs) and studies application of that method to PDE-constrained inverse problems. This approach enables the solution of challenging inverse problems whilst accounting, in a statistically principled way, for the impact of discretisation error due to numerical solution of the PDE. In particular, the approach confers robustness to failure of the numerical PDE solver, with statistical inferences driven to be more conservative in the presence of substantial discretisation error. Going further, the problem of choosing a PDE solver is cast as a problem in the Bayesian design of experiments, where the aim is to minimise the impact of solver error on statistical inferences; here the challenge of non-linear PDEs is also considered. The method is applied to parameter inference problems in which discretisation error in non-negligible and must be accounted for in order to reach conclusions that are statistically valid. △ Less

Submitted 11 July, 2017; v1 submitted 25 May, 2016; originally announced May 2016.

arXiv:1603.03220 [pdf, other]

Convergence Rates for a Class of Estimators Based on Stein's Method

Authors: Chris J. Oates, Jon Cockayne, François-Xavier Briol, Mark Girolami

Abstract: Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper… ▽ More Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein's method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper is to establish theoretical bounds on convergence rates for a class of estimators based on Stein's method. Our analysis accounts for (i) the degree of smoothness of the sampling distribution and test function, (ii) the dimension of the state space, and (iii) the case of non-independent samples arising from a Markov chain. These results provide insight into the rapid convergence of gradient-based estimators observed for low-dimensional problems, as well as clarifying a curse-of-dimension that appears inherent to such methods. △ Less

Submitted 27 December, 2017; v1 submitted 10 March, 2016; originally announced March 2016.

Comments: To appear in Bernoulli, 2018

arXiv:1512.00933 [pdf, other]

Probabilistic Integration: A Role in Statistical Computation?

Authors: François-Xavier Briol, Chris. J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic

Abstract: A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilist… ▽ More A research frontier has emerged in scientific computation, wherein numerical error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the presence of an unknown numerical error. Our main technical contribution is to establish, for the first time, rates of posterior contraction for these methods. These show that probabilistic integrators can in principle enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assess the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir. △ Less

Submitted 18 October, 2017; v1 submitted 2 December, 2015; originally announced December 2015.

Comments: Several improvements suggested by reviewers, including additional experiments on uncertainty quantification properties. Change of title: previously "Probabilistic Integration: A Role for Statisticians in Numerical Analysis?"

Showing 1–32 of 32 results for author: Oates, C