-
Schrödinger Bridge Samplers
Authors:
Espen Bernton,
Jeremy Heng,
Arnaud Doucet,
Pierre E. Jacob
Abstract:
Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ a…
▽ More
Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ and terminal distribution equal to $π_{T}$ which is the closest to the reference process in terms of Kullback--Leibler divergence. This special case of the so-called Schrödinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schrödinger bridge samplers, to approximate a target distribution $π$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $π_T = π$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Approximate Bayesian computation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and…
▽ More
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Langevin Monte Carlo and JKO splitting
Authors:
Espen Bernton
Abstract:
Algorithms based on discretizing Langevin diffusion are popular tools for sampling from high-dimensional distributions. We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alterna…
▽ More
Algorithms based on discretizing Langevin diffusion are popular tools for sampling from high-dimensional distributions. We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alternates between solving the gradient flows of two specific functionals on the space of probability measures. Using this perspective, we derive some new non-asymptotic results on the convergence properties of this algorithm.
△ Less
Submitted 9 May, 2019; v1 submitted 23 February, 2018;
originally announced February 2018.
-
On parameter estimation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which th…
▽ More
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the approximation of these estimators and illustrate their behavior in several numerical experiments. Two of our examples are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.
△ Less
Submitted 9 May, 2019; v1 submitted 18 January, 2017;
originally announced January 2017.
-
Locally weighted Markov chain Monte Carlo
Authors:
Espen Bernton,
Shihao Yang,
Yang Chen,
Neil Shephard,
Jun S. Liu
Abstract:
We propose a weighting scheme for the proposals within Markov chain Monte Carlo algorithms and show how this can improve statistical efficiency at no extra computational cost. These methods are most powerful when combined with multi-proposal MCMC algorithms such as multiple-try Metropolis, which can efficiently exploit modern computer architectures with large numbers of cores. The locally weighted…
▽ More
We propose a weighting scheme for the proposals within Markov chain Monte Carlo algorithms and show how this can improve statistical efficiency at no extra computational cost. These methods are most powerful when combined with multi-proposal MCMC algorithms such as multiple-try Metropolis, which can efficiently exploit modern computer architectures with large numbers of cores. The locally weighted Markov chain Monte Carlo method also improves upon a partial parallelization of the Metropolis-Hastings algorithm via Rao-Blackwellization. We derive the effective sample size of the output of our algorithm and show how to estimate this in practice. Illustrations and examples of the method are given and the algorithm is compared in theory and applications with existing methods.
△ Less
Submitted 29 June, 2015;
originally announced June 2015.