Search | arXiv e-print repository

arXiv:1912.13170 [pdf, other]

Schrödinger Bridge Samplers

Authors: Espen Bernton, Jeremy Heng, Arnaud Doucet, Pierre E. Jacob

Abstract: Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ a… ▽ More Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ and terminal distribution equal to $π_{T}$ which is the closest to the reference process in terms of Kullback--Leibler divergence. This special case of the so-called Schrödinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schrödinger bridge samplers, to approximate a target distribution $π$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $π_T = π$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 53 pages and 9 figures

arXiv:1905.03747 [pdf, other]

doi 10.1111/rssb.12312

Approximate Bayesian computation with the Wasserstein distance

Authors: Espen Bernton, Pierre E. Jacob, Mathieu Gerber, Christian P. Robert

Abstract: A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and… ▽ More A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: 42 pages, 10 figures. Supplementary materials can be found on the first author's webpage. Portions of this work previously appeared as arXiv:1701.05146

Journal ref: Journal of the Royal Statistical Society: Series B, Volume 81, Issue 2, pages 235-269 (April 2019)

arXiv:1802.08671 [pdf, ps, other]

Langevin Monte Carlo and JKO splitting

Authors: Espen Bernton

Abstract: Algorithms based on discretizing Langevin diffusion are popular tools for sampling from high-dimensional distributions. We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alterna… ▽ More Algorithms based on discretizing Langevin diffusion are popular tools for sampling from high-dimensional distributions. We develop novel connections between such Monte Carlo algorithms, the theory of Wasserstein gradient flow, and the operator splitting approach to solving PDEs. In particular, we show that a proximal version of the Unadjusted Langevin Algorithm corresponds to a scheme that alternates between solving the gradient flows of two specific functionals on the space of probability measures. Using this perspective, we derive some new non-asymptotic results on the convergence properties of this algorithm. △ Less

Submitted 9 May, 2019; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: 24 pages. Similar to arxiv:1802.08089

Journal ref: Proceedings of the 31st Conference On Learning Theory, PMLR 75:1777-1798, 2018

arXiv:1701.05146 [pdf, other]

On parameter estimation with the Wasserstein distance

Authors: Espen Bernton, Pierre E. Jacob, Mathieu Gerber, Christian P. Robert

Abstract: Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which th… ▽ More Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the approximation of these estimators and illustrate their behavior in several numerical experiments. Two of our examples are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models. △ Less

Submitted 9 May, 2019; v1 submitted 18 January, 2017; originally announced January 2017.

Comments: 29 pages (+18 pages of appendices), 6 figures. To appear in Information and Inference: A Journal of the IMA. A previous version of this paper contained work on approximate Bayesian computation with the Wasserstein distance, which can now be found at arxiv:1905.03747

arXiv:1506.08852 [pdf, other]

Locally weighted Markov chain Monte Carlo

Authors: Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, Jun S. Liu

Abstract: We propose a weighting scheme for the proposals within Markov chain Monte Carlo algorithms and show how this can improve statistical efficiency at no extra computational cost. These methods are most powerful when combined with multi-proposal MCMC algorithms such as multiple-try Metropolis, which can efficiently exploit modern computer architectures with large numbers of cores. The locally weighted… ▽ More We propose a weighting scheme for the proposals within Markov chain Monte Carlo algorithms and show how this can improve statistical efficiency at no extra computational cost. These methods are most powerful when combined with multi-proposal MCMC algorithms such as multiple-try Metropolis, which can efficiently exploit modern computer architectures with large numbers of cores. The locally weighted Markov chain Monte Carlo method also improves upon a partial parallelization of the Metropolis-Hastings algorithm via Rao-Blackwellization. We derive the effective sample size of the output of our algorithm and show how to estimate this in practice. Illustrations and examples of the method are given and the algorithm is compared in theory and applications with existing methods. △ Less

Submitted 29 June, 2015; originally announced June 2015.

Showing 1–5 of 5 results for author: Bernton, E