-
On importance sampling and independent Metropolis-Hastings with an unbounded weight function
Authors:
George Deligiannidis,
Pierre E. Jacob,
El Mahdi Khribch,
Guanyang Wang
Abstract:
Importance sampling and independent Metropolis-Hastings (IMH) are among the fundamental building blocks of Monte Carlo methods. Both require a proposal distribution that globally approximates the target distribution. The Radon-Nikodym derivative of the target distribution relative to the proposal is called the weight function. Under the assumption that the weight is unbounded but has finite moment…
▽ More
Importance sampling and independent Metropolis-Hastings (IMH) are among the fundamental building blocks of Monte Carlo methods. Both require a proposal distribution that globally approximates the target distribution. The Radon-Nikodym derivative of the target distribution relative to the proposal is called the weight function. Under the assumption that the weight is unbounded but has finite moments under the proposal distribution, we study the approximation error of importance sampling and of the particle independent Metropolis-Hastings algorithm (PIMH), which includes IMH as a special case. For the chains generated by such algorithms, we show that the common random numbers coupling is maximal. Using that coupling we derive bounds on the total variation distance of a PIMH chain to its target distribution. Our results allow a formal comparison of the finite-time biases of importance sampling and IMH, and we find the latter to be have a smaller bias. We further consider bias removal techniques using couplings, and provide conditions under which the resulting unbiased estimators have finite moments. These unbiased estimators provide an alternative to self-normalized importance sampling, implementable in the same settings. We compare their asymptotic efficiency as the number of particles goes to infinity, and consider their use in robust mean estimation techniques.
△ Less
Submitted 14 June, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
-
Unbiased Markov Chain Monte Carlo: what, why, and how
Authors:
Yves F. Atchadé,
Pierre E. Jacob
Abstract:
This document presents methods to remove the initialization or burn-in bias from Markov chain Monte Carlo (MCMC) estimates, with consequences on parallel computing, convergence diagnostics and performance assessment. The document is written as an introduction to these methods for MCMC users. Some theoretical results are mentioned, but the focus is on the methodology.
This document presents methods to remove the initialization or burn-in bias from Markov chain Monte Carlo (MCMC) estimates, with consequences on parallel computing, convergence diagnostics and performance assessment. The document is written as an introduction to these methods for MCMC users. Some theoretical results are mentioned, but the focus is on the methodology.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Solving the Poisson equation using coupled Markov chains
Authors:
Randal Douc,
Pierre E. Jacob,
Anthony Lee,
Dootika Vats
Abstract:
This article shows how coupled Markov chains that meet exactly after a random number of iterations can be used to generate unbiased estimators of the solutions of the Poisson equation. Through this connection, we re-derive known unbiased estimators of expectations with respect to the stationary distribution of a Markov chain and provide conditions for the finiteness of their moments. We further co…
▽ More
This article shows how coupled Markov chains that meet exactly after a random number of iterations can be used to generate unbiased estimators of the solutions of the Poisson equation. Through this connection, we re-derive known unbiased estimators of expectations with respect to the stationary distribution of a Markov chain and provide conditions for the finiteness of their moments. We further construct unbiased estimators of the asymptotic variance of Markov chain ergodic averages, and provide conditions for the finiteness of the estimators' moments of any order. If their second moment is finite, the average of independent copies of such estimators converges to the asymptotic variance at the Monte Carlo rate, comparing favorably to known rates for batch means and spectral variance estimators. The results are illustrated with numerical experiments.
△ Less
Submitted 2 July, 2025; v1 submitted 12 June, 2022;
originally announced June 2022.
-
Asymptotics of cut distributions and robust modular inference using Posterior Bootstrap
Authors:
Emilia Pompe,
Pierre E. Jacob
Abstract:
Bayesian inference provides a framework to combine an arbitrary number of model components with shared parameters, allowing joint uncertainty estimation and the use of all available data sources. However, misspecification of any part of the model might propagate to all other parts and lead to unsatisfactory results. Cut distributions have been proposed as a remedy, where the information is prevent…
▽ More
Bayesian inference provides a framework to combine an arbitrary number of model components with shared parameters, allowing joint uncertainty estimation and the use of all available data sources. However, misspecification of any part of the model might propagate to all other parts and lead to unsatisfactory results. Cut distributions have been proposed as a remedy, where the information is prevented from flowing along certain directions. We consider cut distributions from an asymptotic perspective, find the equivalent of the Laplace approximation, and notice a lack of frequentist coverage for the associate credible regions. We propose algorithms based on the Posterior Bootstrap that deliver credible regions with the nominal frequentist asymptotic coverage. The algorithms involve numerical optimization programs that can be performed fully in parallel. The results and methods are illustrated in various settings, such as causal inference with propensity scores and epidemiological studies.
△ Less
Submitted 28 October, 2021; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections
Authors:
Kimia Nadjahi,
Alain Durmus,
Pierre E. Jacob,
Roland Badeau,
Umut Şimşekli
Abstract:
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of meas…
▽ More
The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
△ Less
Submitted 4 January, 2022; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Authors:
Nianqiao Ju,
Jeremy Heng,
Pierre E. Jacob
Abstract:
Agent-based models of disease transmission involve stochastic rules that specify how a number of individuals would infect one another, recover or be removed from the population. Common yet stringent assumptions stipulate interchangeability of agents and that all pairwise contact are equally likely. Under these assumptions, the population can be summarized by counting the number of susceptible and…
▽ More
Agent-based models of disease transmission involve stochastic rules that specify how a number of individuals would infect one another, recover or be removed from the population. Common yet stringent assumptions stipulate interchangeability of agents and that all pairwise contact are equally likely. Under these assumptions, the population can be summarized by counting the number of susceptible and infected individuals, which greatly facilitates statistical inference. We consider the task of inference without such simplifying assumptions, in which case, the population cannot be summarized by low-dimensional counts. We design improved particle filters, where each particle corresponds to a specific configuration of the population of agents, that take either the next or all future observations into account when proposing population configurations. Using simulated data sets, we illustrate that orders of magnitude improvements are possible over bootstrap particle filters. We also provide theoretical support for the approximations employed to make the algorithms practical.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Coupling-based convergence assessment of some Gibbs samplers for high-dimensional Bayesian regression with shrinkage priors
Authors:
Niloy Biswas,
Anirban Bhattacharya,
Pierre E. Jacob,
James E. Johndrow
Abstract:
We consider Markov chain Monte Carlo (MCMC) algorithms for Bayesian high-dimensional regression with continuous shrinkage priors. A common challenge with these algorithms is the choice of the number of iterations to perform. This is critical when each iteration is expensive, as is the case when dealing with modern data sets, such as genome-wide association studies with thousands of rows and up to…
▽ More
We consider Markov chain Monte Carlo (MCMC) algorithms for Bayesian high-dimensional regression with continuous shrinkage priors. A common challenge with these algorithms is the choice of the number of iterations to perform. This is critical when each iteration is expensive, as is the case when dealing with modern data sets, such as genome-wide association studies with thousands of rows and up to hundred of thousands of columns. We develop coupling techniques tailored to the setting of high-dimensional regression with shrinkage priors, which enable practical, non-asymptotic diagnostics of convergence without relying on traceplots or long-run asymptotics. By establishing geometric drift and minorization conditions for the algorithm under consideration, we prove that the proposed couplings have finite expected meeting time. Focusing on a class of shrinkage priors which includes the 'Horseshoe', we empirically demonstrate the scalability of the proposed couplings. A highlight of our findings is that less than 1000 iterations can be enough for a Gibbs sampler to reach stationarity in a regression on 100,000 covariates. The numerical results also illustrate the impact of the prior on the computational efficiency of the coupling, and suggest the use of priors where the local precisions are Half-t distributed with degree of freedom larger than one.
△ Less
Submitted 9 July, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
A simple Markov chain for independent Bernoulli variables conditioned on their sum
Authors:
Jeremy Heng,
Pierre E. Jacob,
Nianqiao Ju
Abstract:
We consider a vector of $N$ independent binary variables, each with a different probability of success. The distribution of the vector conditional on its sum is known as the conditional Bernoulli distribution. Assuming that $N$ goes to infinity and that the sum is proportional to $N$, exact sampling costs order $N^2$, while a simple Markov chain Monte Carlo algorithm using 'swaps' has constant cos…
▽ More
We consider a vector of $N$ independent binary variables, each with a different probability of success. The distribution of the vector conditional on its sum is known as the conditional Bernoulli distribution. Assuming that $N$ goes to infinity and that the sum is proportional to $N$, exact sampling costs order $N^2$, while a simple Markov chain Monte Carlo algorithm using 'swaps' has constant cost per iteration. We provide conditions under which this Markov chain converges in order $N \log N$ iterations. Our proof relies on couplings and an auxiliary Markov chain defined on a partition of the space into favorable and unfavorable pairs.
△ Less
Submitted 5 December, 2020;
originally announced December 2020.
-
Maximal couplings of the Metropolis-Hastings algorithm
Authors:
John O'Leary,
Guanyang Wang,
Pierre E. Jacob
Abstract:
Couplings play a central role in the analysis of Markov chain Monte Carlo algorithms and appear increasingly often in the algorithms themselves, e.g. in convergence diagnostics, parallelization, and variance reduction techniques. Existing couplings of the Metropolis-Hastings algorithm handle the proposal and acceptance steps separately and fall short of the upper bound on one-step meeting probabil…
▽ More
Couplings play a central role in the analysis of Markov chain Monte Carlo algorithms and appear increasingly often in the algorithms themselves, e.g. in convergence diagnostics, parallelization, and variance reduction techniques. Existing couplings of the Metropolis-Hastings algorithm handle the proposal and acceptance steps separately and fall short of the upper bound on one-step meeting probabilities given by the coupling inequality. This paper introduces maximal couplings which achieve this bound while retaining the practical advantages of current methods. We consider the properties of these couplings and examine their behavior on a selection of numerical examples.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
An invitation to sequential Monte Carlo samplers
Authors:
Chenguang Dai,
Jeremy Heng,
Pierre E. Jacob,
Nick Whiteley
Abstract:
Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become…
▽ More
Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become general and scalable sampling techniques. This article describes sequential Monte Carlo samplers and their possible implementations, arguing that they remain under-used in statistics, despite their ability to perform sequential inference and to leverage parallel processing resources among other potential benefits.
△ Less
Submitted 17 June, 2022; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Schrödinger Bridge Samplers
Authors:
Espen Bernton,
Jeremy Heng,
Arnaud Doucet,
Pierre E. Jacob
Abstract:
Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ a…
▽ More
Consider a reference Markov process with initial distribution $π_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $π_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schrödinger addressed the problem of identifying the Markov process with initial distribution $π_{0}$ and terminal distribution equal to $π_{T}$ which is the closest to the reference process in terms of Kullback--Leibler divergence. This special case of the so-called Schrödinger bridge problem can be solved using iterative proportional fitting, also known as the Sinkhorn algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schrödinger bridge samplers, to approximate a target distribution $π$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $π_T = π$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We report preliminary experiments and make connections with other problems arising in the optimal transport, optimal control and physics literatures.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
A Gibbs sampler for a class of random convex polytopes
Authors:
Pierre E. Jacob,
Ruobin Gong,
Paul T. Edlefsen,
Arthur P. Dempster
Abstract:
We present a Gibbs sampler for the Dempster-Shafer (DS) approach to statistical inference for Categorical distributions. The DS framework extends the Bayesian approach, allows in particular the use of partial prior information, and yields three-valued uncertainty assessments representing probabilities "for", "against", and "don't know" about formal assertions of interest. The proposed algorithm ta…
▽ More
We present a Gibbs sampler for the Dempster-Shafer (DS) approach to statistical inference for Categorical distributions. The DS framework extends the Bayesian approach, allows in particular the use of partial prior information, and yields three-valued uncertainty assessments representing probabilities "for", "against", and "don't know" about formal assertions of interest. The proposed algorithm targets the distribution of a class of random convex polytopes which encapsulate the DS inference. The sampler relies on an equivalence between the iterative constraints of the vertex configuration and the non-negativity of cycles in a fully connected directed graph. Illustrations include the testing of independence in 2x2 contingency tables and parameter estimation of the linkage model.
△ Less
Submitted 21 January, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Estimating Convergence of Markov chains with L-Lag Couplings
Authors:
Niloy Biswas,
Pierre E. Jacob,
Paul Vanetti
Abstract:
Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intract…
▽ More
Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce L-lag couplings to generate computable, non-asymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply L-lag couplings to the tasks of (i) determining MCMC burn-in, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and self-normalized importance samplers.
△ Less
Submitted 28 October, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Approximate Bayesian computation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and…
▽ More
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Unbiased Smoothing using Particle Independent Metropolis-Hastings
Authors:
Lawrence Middleton,
George Deligiannidis,
Arnaud Doucet,
Pierre E. Jacob
Abstract:
We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using…
▽ More
We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. Unbiased estimators are appealing in the context of parallel computing, and facilitate the construction of confidence intervals. The proposed scheme only requires access to off-the-shelf Particle Filters (PF) and is thus easier to implement than recently proposed unbiased smoothers. The approach is demonstrated on a Lévy-driven stochastic volatility model and a stochastic kinetic model.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach
Authors:
Alexander Lin,
Yingzhuo Zhang,
Jeremy Heng,
Stephen A. Allsop,
Kay M. Tye,
Pierre E. Jacob,
Demba Ba
Abstract:
We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups. Our motivation comes from neuroscience, where an important problem is to identify, within a large assembly of neurons, subsets that respond similarly to a stimulus or contingency. Upon modeling the multiple time series as the output of a Diri…
▽ More
We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups. Our motivation comes from neuroscience, where an important problem is to identify, within a large assembly of neurons, subsets that respond similarly to a stimulus or contingency. Upon modeling the multiple time series as the output of a Dirichlet process mixture of nonlinear state-space models, we derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling cluster assignments and sampling parameter values that form the basis of the clustering. The Metropolis step employs recent innovations in particle-based methods. We apply the framework to clustering time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear.
△ Less
Submitted 4 March, 2019; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation
Authors:
Maxime Rischard,
Pierre E. Jacob,
Natesh Pillai
Abstract:
Posterior distributions often feature intractable normalizing constants, called marginal likelihoods or evidence, that are useful for model comparison via Bayes factors. This has motivated a number of methods for estimating ratios of normalizing constants in statistics. In computational physics the logarithm of these ratios correspond to free energy differences. Combining unbiased Markov chain Mon…
▽ More
Posterior distributions often feature intractable normalizing constants, called marginal likelihoods or evidence, that are useful for model comparison via Bayes factors. This has motivated a number of methods for estimating ratios of normalizing constants in statistics. In computational physics the logarithm of these ratios correspond to free energy differences. Combining unbiased Markov chain Monte Carlo estimators with path sampling, also called thermodynamic integration, we propose new unbiased estimators of the logarithm of ratios of normalizing constants. As a by-product, we propose unbiased estimators of the Bayesian cross-validation criterion. The proposed estimators are consistent, asymptotically Normal and can easily benefit from parallel processing devices. Various examples are considered for illustration.
△ Less
Submitted 2 October, 2018;
originally announced October 2018.
-
Adaptive Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo
Authors:
Alexander Buchholz,
Nicolas Chopin,
Pierre E. Jacob
Abstract:
Sequential Monte Carlo (SMC) samplers form an attractive alternative to MCMC for Bayesian computation. However, their performance depends strongly on the Markov kernels used to rejuvenate particles. We discuss how to calibrate automatically (using the current particles) Hamiltonian Monte Carlo kernels within SMC. To do so, we build upon the adaptive SMC approach of Fearnhead and Taylor (2013), and…
▽ More
Sequential Monte Carlo (SMC) samplers form an attractive alternative to MCMC for Bayesian computation. However, their performance depends strongly on the Markov kernels used to rejuvenate particles. We discuss how to calibrate automatically (using the current particles) Hamiltonian Monte Carlo kernels within SMC. To do so, we build upon the adaptive SMC approach of Fearnhead and Taylor (2013), and we also suggest alternative methods. We illustrate the advantages of using HMC kernels within an SMC sampler via an extensive numerical study.
△ Less
Submitted 12 February, 2020; v1 submitted 23 August, 2018;
originally announced August 2018.
-
Unbiased Markov chain Monte Carlo for intractable target distributions
Authors:
Lawrence Middleton,
George Deligiannidis,
Arnaud Doucet,
Pierre E. Jacob
Abstract:
Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm fo…
▽ More
Performing numerical integration when the integrand itself cannot be evaluated point-wise is a challenging task that arises in statistical analysis, notably in Bayesian inference for models with intractable likelihood functions. Markov chain Monte Carlo (MCMC) algorithms have been proposed for this setting, such as the pseudo-marginal method for latent variable models and the exchange algorithm for a class of undirected graphical models. As with any MCMC algorithm, the resulting estimators are justified asymptotically in the limit of the number of iterations, but exhibit a bias for any fixed number of iterations due to the Markov chains starting outside of stationarity. This "burn-in" bias is known to complicate the use of parallel processors for MCMC computations. We show how to use coupling techniques to generate unbiased estimators in finite time, building on recent advances for generic MCMC algorithms. We establish the theoretical validity of some of these procedures by extending existing results to cover the case of polynomially ergodic Markov chains. The efficiency of the proposed estimators is compared with that of standard MCMC estimators, with theoretical arguments and numerical experiments including state space models and Ising models.
△ Less
Submitted 15 June, 2020; v1 submitted 23 July, 2018;
originally announced July 2018.
-
Bayesian model comparison with the Hyvärinen score: computation and consistency
Authors:
Stephane Shao,
Pierre E. Jacob,
Jie Ding,
Vahid Tarokh
Abstract:
The Bayes factor is a widely used criterion in model comparison and its logarithm is a difference of out-of-sample predictive scores under the logarithmic scoring rule. However, when some of the candidate models involve vague priors on their parameters, the log-Bayes factor features an arbitrary additive constant that hinders its interpretation. As an alternative, we consider model comparison usin…
▽ More
The Bayes factor is a widely used criterion in model comparison and its logarithm is a difference of out-of-sample predictive scores under the logarithmic scoring rule. However, when some of the candidate models involve vague priors on their parameters, the log-Bayes factor features an arbitrary additive constant that hinders its interpretation. As an alternative, we consider model comparison using the Hyvärinen score. We propose a method to consistently estimate this score for parametric models, using sequential Monte Carlo methods. We show that this score can be estimated for models with tractable likelihoods as well as nonlinear non-Gaussian state-space models with intractable likelihoods. We prove the asymptotic consistency of this new model selection criterion under strong regularity assumptions in the case of non-nested models, and we provide qualitative insights for the nested case. We also use existing characterizations of proper scoring rules on discrete spaces to extend the Hyvärinen score to discrete observations. Our numerical illustrations include Lévy-driven stochastic volatility models and diffusion models for population dynamics.
△ Less
Submitted 5 September, 2018; v1 submitted 31 October, 2017;
originally announced November 2017.
-
Unbiased Hamiltonian Monte Carlo with couplings
Authors:
Jeremy Heng,
Pierre E. Jacob
Abstract:
We propose a methodology to parallelize Hamiltonian Monte Carlo estimators. Our approach constructs a pair of Hamiltonian Monte Carlo chains that are coupled in such a way that they meet exactly after some random number of iterations. These chains can then be combined so that resulting estimators are unbiased. This allows us to produce independent replicates in parallel and average them to obtain…
▽ More
We propose a methodology to parallelize Hamiltonian Monte Carlo estimators. Our approach constructs a pair of Hamiltonian Monte Carlo chains that are coupled in such a way that they meet exactly after some random number of iterations. These chains can then be combined so that resulting estimators are unbiased. This allows us to produce independent replicates in parallel and average them to obtain estimators that are consistent in the limit of the number of replicates, instead of the usual limit of the number of Markov chain iterations. We investigate the scalability of our coupling in high dimensions on a toy example. The choice of algorithmic parameters and the efficiency of our proposed methodology are then illustrated on a logistic regression with 300 covariates, and a log-Gaussian Cox point processes model with low to fine grained discretizations.
△ Less
Submitted 27 August, 2018; v1 submitted 1 September, 2017;
originally announced September 2017.
-
Better together? Statistical learning in models made of modules
Authors:
Pierre E. Jacob,
Lawrence M. Murray,
Chris C. Holmes,
Christian P. Robert
Abstract:
In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected "modules", each relating to a specific data modality, and drawing on specific domain expertise in their developmen…
▽ More
In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected "modules", each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the estimate and update of others, often in unpredictable ways. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of approaches that restrict the information propagation between modules, for example by restricting propagation to only particular directions along the edges of the graph. In this article, we investigate why these modular approaches might be preferable to the full model in misspecified settings. We propose principled criteria to choose between modular and full-model approaches. The question arises in many applied settings, including large stochastic dynamical systems, meta-analysis, epidemiological models, air pollution models, pharmacokinetics-pharmacodynamics, and causal inference with propensity scores.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Unbiased Markov chain Monte Carlo with couplings
Authors:
Pierre E. Jacob,
John O'Leary,
Yves F. Atchadé
Abstract:
Markov chain Monte Carlo (MCMC) methods provide consistent of integrals as the number of iterations goes to infinity. MCMC estimators are generally biased after any fixed number of iterations. We propose to remove this bias by using couplings of Markov chains together with a telescopic sum argument of Glynn and Rhee (2014). The resulting unbiased estimators can be computed independently in paralle…
▽ More
Markov chain Monte Carlo (MCMC) methods provide consistent of integrals as the number of iterations goes to infinity. MCMC estimators are generally biased after any fixed number of iterations. We propose to remove this bias by using couplings of Markov chains together with a telescopic sum argument of Glynn and Rhee (2014). The resulting unbiased estimators can be computed independently in parallel. We discuss practical couplings for popular MCMC algorithms. We establish the theoretical validity of the proposed estimators and study their efficiency relative to the underlying MCMC algorithms. Finally, we illustrate the performance and limitations of the method on toy examples, on an Ising model around its critical temperature, on a high-dimensional variable selection problem, and on an approximation of the cut distribution arising in Bayesian inference for models made of multiple modules.
△ Less
Submitted 17 July, 2019; v1 submitted 11 August, 2017;
originally announced August 2017.
-
Boundary-Seeking Generative Adversarial Networks
Authors:
R Devon Hjelm,
Athul Paul Jacob,
Tong Che,
Adam Trischler,
Kyunghyun Cho,
Yoshua Bengio
Abstract:
Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discr…
▽ More
Generative adversarial networks (GANs) are a learning framework that rely on training a discriminator to estimate a measure of difference between a target and generated distributions. GANs, as normally formulated, rely on the generated samples being completely differentiable w.r.t. the generative parameters, and thus do not work for discrete data. We introduce a method for training GANs with discrete data that uses the estimated difference measure from the discriminator to compute importance weights for generated samples, thus providing a policy gradient for training the generator. The importance weights have a strong connection to the decision boundary of the discriminator, and we call our method boundary-seeking GANs (BGANs). We demonstrate the effectiveness of the proposed algorithm with discrete image and character-based natural language generation. In addition, the boundary-seeking objective extends to continuous data, which can be used to improve stability of training, and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN) bedrooms, and Imagenet without conditioning.
△ Less
Submitted 21 February, 2018; v1 submitted 27 February, 2017;
originally announced February 2017.
-
On parameter estimation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which th…
▽ More
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the approximation of these estimators and illustrate their behavior in several numerical experiments. Two of our examples are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.
△ Less
Submitted 9 May, 2019; v1 submitted 18 January, 2017;
originally announced January 2017.
-
Smoothing with Couplings of Conditional Particle Filters
Authors:
Pierre E. Jacob,
Fredrik Lindsten,
Thomas B. Schön
Abstract:
In state space models, smoothing refers to the task of estimating a latent stochastic process given noisy measurements related to the process. We propose an unbiased estimator of smoothing expectations. The lack-of-bias property has methodological benefits: independent estimators can be generated in parallel, and confidence intervals can be constructed from the central limit theorem to quantify th…
▽ More
In state space models, smoothing refers to the task of estimating a latent stochastic process given noisy measurements related to the process. We propose an unbiased estimator of smoothing expectations. The lack-of-bias property has methodological benefits: independent estimators can be generated in parallel, and confidence intervals can be constructed from the central limit theorem to quantify the approximation error. To design unbiased estimators, we combine a generic debiasing technique for Markov chains with a Markov chain Monte Carlo algorithm for smoothing. The resulting procedure is widely applicable and we show in numerical experiments that the removal of the bias comes at a manageable increase in variance. We establish the validity of the proposed estimators under mild assumptions. Numerical experiments are provided on toy models, including a setting of highly-informative observations, and a realistic Lotka-Volterra model with an intractable transition density.
△ Less
Submitted 5 September, 2018; v1 submitted 8 January, 2017;
originally announced January 2017.
-
Coupling of Particle Filters
Authors:
Pierre E. Jacob,
Fredrik Lindsten,
Thomas B. Schön
Abstract:
Particle filters provide Monte Carlo approximations of intractable quantities such as point-wise evaluations of the likelihood in state space models. In many scenarios, the interest lies in the comparison of these quantities as some parameter or input varies. To facilitate such comparisons, we introduce and study methods to couple two particle filters in such a way that the correlation between the…
▽ More
Particle filters provide Monte Carlo approximations of intractable quantities such as point-wise evaluations of the likelihood in state space models. In many scenarios, the interest lies in the comparison of these quantities as some parameter or input varies. To facilitate such comparisons, we introduce and study methods to couple two particle filters in such a way that the correlation between the two underlying particle systems is increased. The motivation stems from the classic variance reduction technique of positively correlating two estimators. The key challenge in constructing such a coupling stems from the discontinuity of the resampling step of the particle filter. As our first contribution, we consider coupled resampling algorithms. Within bootstrap particle filters, they improve the precision of finite-difference estimators of the score vector and boost the performance of particle marginal Metropolis--Hastings algorithms for parameter inference. The second contribution arises from the use of these coupled resampling schemes within conditional particle filters, allowing for unbiased estimators of smoothing functionals. The result is a new smoothing strategy that operates by averaging a number of independent and unbiased estimators, which allows for 1) straightforward parallelization and 2) the construction of accurate error estimates. Neither of the above is possible with existing particle smoothers.
△ Less
Submitted 16 July, 2016; v1 submitted 3 June, 2016;
originally announced June 2016.
-
Bayesian inference in non-Markovian state-space models with applications to fractional order systems
Authors:
Pierre E. Jacob,
S. M. Mahdi Alavi,
Adam Mahdi,
Stephen J. Payne,
David A. Howey
Abstract:
Battery impedance spectroscopy models are given by fractional order (FO) differential equations. In the discrete-time domain, they give rise to state-space models where the latent process is not Markovian. Parameter estimation for these models is therefore challenging, especially for non-commensurate FO models. In this paper, we propose a Bayesian approach to identify the parameters of generic FO…
▽ More
Battery impedance spectroscopy models are given by fractional order (FO) differential equations. In the discrete-time domain, they give rise to state-space models where the latent process is not Markovian. Parameter estimation for these models is therefore challenging, especially for non-commensurate FO models. In this paper, we propose a Bayesian approach to identify the parameters of generic FO systems. The computational challenge is tackled with particle Markov chain Monte Carlo methods, with an implementation specifically designed for the non-Markovian setting. The approach is then applied to estimate the parameters of a battery non-commensurate FO equivalent circuit model. Extensive simulations are provided to study the practical identifiability of model parameters and their sensitivity to the choice of prior distributions, the number of observations, the magnitude of the input signal and the measurement noise.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
Sequential Bayesian inference for implicit hidden Markov models and current limitations
Authors:
Pierre E. Jacob
Abstract:
Hidden Markov models can describe time series arising in various fields of science, by treating the data as noisy measurements of an arbitrarily complex Markov process. Sequential Monte Carlo (SMC) methods have become standard tools to estimate the hidden Markov process given the observations and a fixed parameter value. We review some of the recent developments allowing the inclusion of parameter…
▽ More
Hidden Markov models can describe time series arising in various fields of science, by treating the data as noisy measurements of an arbitrarily complex Markov process. Sequential Monte Carlo (SMC) methods have become standard tools to estimate the hidden Markov process given the observations and a fixed parameter value. We review some of the recent developments allowing the inclusion of parameter uncertainty as well as model uncertainty. The shortcomings of the currently available methodology are emphasised from an algorithmic complexity perspective. The statistical objects of interest for time series analysis are illustrated on a toy "Lotka-Volterra" model used in population ecology. Some open challenges are discussed regarding the scalability of the reviewed methodology to longer time series, higher-dimensional state spaces and more flexible models.
△ Less
Submitted 16 May, 2015;
originally announced May 2015.
-
On nonnegative unbiased estimators
Authors:
Pierre E. Jacob,
Alexandre H. Thiery
Abstract:
We study the existence of algorithms generating almost surely nonnegative unbiased estimators. We show that given a nonconstant real-valued function $f$ and a sequence of unbiased estimators of $λ\in\mathbb{R}$, there is no algorithm yielding almost surely nonnegative unbiased estimators of $f(λ)\in\mathbb{R}^+$. The study is motivated by pseudo-marginal Monte Carlo algorithms that rely on such no…
▽ More
We study the existence of algorithms generating almost surely nonnegative unbiased estimators. We show that given a nonconstant real-valued function $f$ and a sequence of unbiased estimators of $λ\in\mathbb{R}$, there is no algorithm yielding almost surely nonnegative unbiased estimators of $f(λ)\in\mathbb{R}^+$. The study is motivated by pseudo-marginal Monte Carlo algorithms that rely on such nonnegative unbiased estimators. These methods allow "exact inference" in intractable models, in the sense that integrals with respect to a target distribution can be estimated without any systematic error, even though the associated probability density function cannot be evaluated pointwise. We discuss the consequences of our results on the applicability of pseudo-marginal algorithms and thus on the possibility of exact inference in intractable models. We illustrate our study with particular choices of functions $f$ corresponding to known challenges in statistics, such as exact simulation of diffusions, inference in large datasets and doubly intractable distributions.
△ Less
Submitted 1 April, 2015; v1 submitted 25 September, 2013;
originally announced September 2013.
-
Path storage in the particle filter
Authors:
Pierre E. Jacob,
Lawrence Murray,
Sylvain Rubenthaler
Abstract:
This article considers the problem of storing the paths generated by a particle filter and more generally by a sequential Monte Carlo algorithm. It provides a theoretical result bounding the expected memory cost by $T + C N \log N$ where $T$ is the time horizon, $N$ is the number of particles and $C$ is a constant, as well as an efficient algorithm to realise this. The theoretical result and the a…
▽ More
This article considers the problem of storing the paths generated by a particle filter and more generally by a sequential Monte Carlo algorithm. It provides a theoretical result bounding the expected memory cost by $T + C N \log N$ where $T$ is the time horizon, $N$ is the number of particles and $C$ is a constant, as well as an efficient algorithm to realise this. The theoretical result and the algorithm are illustrated with numerical experiments.
△ Less
Submitted 29 January, 2014; v1 submitted 11 July, 2013;
originally announced July 2013.
-
Derivative-Free Estimation of the Score Vector and Observed Information Matrix with Application to State-Space Models
Authors:
Arnaud Doucet,
Pierre E. Jacob,
Sylvain Rubenthaler
Abstract:
Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems, PNAS 103) have recently introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according to its prior distribution. Their methodology relies on an approximation of the score vector for general statistical mode…
▽ More
Ionides, King et al. (see e.g. Inference for nonlinear dynamical systems, PNAS 103) have recently introduced an original approach to perform maximum likelihood parameter estimation in state-space models which only requires being able to simulate the latent Markov model according to its prior distribution. Their methodology relies on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. We show here that this score estimator can be derived from a simple application of Stein's lemma and how an additional application of this lemma provides an original derivative-free estimator of the observed information matrix. We establish that these estimators exhibit robustness properties compared to finite difference estimators while their bias and variance scale as well as finite difference type estimators, including simultaneous perturbations (see e.g. Spall, IEEE Trans. on Automatic Control 37), with respect to the dimension of the parameter. For state-space models where sequential Monte Carlo computation is required, these estimators can be further improved. In this specific context, we derive original derivative-free estimators of the score vector and observed information matrix which are computed using sequential Monte Carlo approximations of smoothed additive functionals associated with a modified version of the original state-space model.
△ Less
Submitted 12 July, 2015; v1 submitted 21 April, 2013;
originally announced April 2013.
-
Parallel resampling in the particle filter
Authors:
Lawrence M. Murray,
Anthony Lee,
Pierre E. Jacob
Abstract:
Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through…
▽ More
Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation.
△ Less
Submitted 11 June, 2015; v1 submitted 17 January, 2013;
originally announced January 2013.
-
The Wang-Landau algorithm reaches the flat histogram criterion in finite time
Authors:
Pierre E. Jacob,
Robin J. Ryder
Abstract:
The Wang-Landau algorithm aims at sampling from a probability distribution, while penalizing some regions of the state space and favoring others. It is widely used, but its convergence properties are still unknown. We show that for some variations of the algorithm, the Wang-Landau algorithm reaches the so-called flat histogram criterion in finite time, and that this criterion can be never reached…
▽ More
The Wang-Landau algorithm aims at sampling from a probability distribution, while penalizing some regions of the state space and favoring others. It is widely used, but its convergence properties are still unknown. We show that for some variations of the algorithm, the Wang-Landau algorithm reaches the so-called flat histogram criterion in finite time, and that this criterion can be never reached for other variations. The arguments are shown in a simple context - compact spaces, density functions bounded from both sides - for the sake of clarity, and could be extended to more general contexts.
△ Less
Submitted 15 January, 2014; v1 submitted 18 October, 2011;
originally announced October 2011.
-
An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration
Authors:
Luke Bornn,
Pierre Jacob,
Pierre Del Moral,
Arnaud Doucet
Abstract:
While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for…
▽ More
While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.
△ Less
Submitted 14 June, 2012; v1 submitted 17 September, 2011;
originally announced September 2011.
-
Frontier estimation with local polynomials and high power-transformed data
Authors:
Stéphane Girard,
Pierre Jacob
Abstract:
We present a new method for estimating the frontier of a sample. The estimator is based on a local polynomial regression on the power-transformed data. We assume that the exponent of the transformation goes to infinity while the bandwidth goes to zero. We give conditions on these two parameters to obtain almost complete convergence. The asymptotic conditional bias and variance of the estimator are…
▽ More
We present a new method for estimating the frontier of a sample. The estimator is based on a local polynomial regression on the power-transformed data. We assume that the exponent of the transformation goes to infinity while the bandwidth goes to zero. We give conditions on these two parameters to obtain almost complete convergence. The asymptotic conditional bias and variance of the estimator are provided and its good performance is illustrated on some finite sample situations.
△ Less
Submitted 1 April, 2011;
originally announced April 2011.
-
Frontier estimation via kernel regression on high power-transformed data
Authors:
Stéphane Girard,
Pierre Jacob
Abstract:
We present a new method for estimating the frontier of a multidimensional sample. The estimator is based on a kernel regression on the power-transformed data. We assume that the exponent of the transformation goes to infinity while the bandwidth of the kernel goes to zero. We give conditions on these two parameters to obtain complete convergence and asymptotic normality. The good performance of th…
▽ More
We present a new method for estimating the frontier of a multidimensional sample. The estimator is based on a kernel regression on the power-transformed data. We assume that the exponent of the transformation goes to infinity while the bandwidth of the kernel goes to zero. We give conditions on these two parameters to obtain complete convergence and asymptotic normality. The good performance of the estimator is illustrated on some finite sample situations.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Extreme value and Haar series estimates of point process boundaries
Authors:
Stéphane Girard,
Pierre Jacob
Abstract:
We present a new method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on Haar series and extreme values of the point process. We give conditions for various kind of convergence and we obtain remarkably different possible limit distributions. We propose a method of reducing the negative bias, illust…
▽ More
We present a new method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on Haar series and extreme values of the point process. We give conditions for various kind of convergence and we obtain remarkably different possible limit distributions. We propose a method of reducing the negative bias, illustrated by a simulation.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Projection estimates of point processes boundaries
Authors:
Stéphane Girard,
Pierre Jacob
Abstract:
We present a method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on projections on C^1 bases and on extreme points of the point process. We give conditions on the Dirichlet's kernel associated to the C^1 bases for various kinds of convergence and asymptotic normality. We propose a method for reduc…
▽ More
We present a method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on projections on C^1 bases and on extreme points of the point process. We give conditions on the Dirichlet's kernel associated to the C^1 bases for various kinds of convergence and asymptotic normality. We propose a method for reducing the negative bias and illustrate it by a simulation.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Extreme values and kernel estimates of point processes boundaries
Authors:
Stéphane Girard,
Pierre Jacob
Abstract:
We present a method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on a Parzen-Rosenblatt kernel and extreme values of point processes. We give conditions for various kinds of convergence and asymptotic normality. We propose a method of reducing the negative bias and edge effects, illustrated by a s…
▽ More
We present a method for estimating the edge of a two-dimensional bounded set, given a finite random set of points drawn from the interior. The estimator is based both on a Parzen-Rosenblatt kernel and extreme values of point processes. We give conditions for various kinds of convergence and asymptotic normality. We propose a method of reducing the negative bias and edge effects, illustrated by a simulation.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
SMC^2: an efficient algorithm for sequential analysis of state-space models
Authors:
Nicolas Chopin,
Pierre E. Jacob,
Omiros Papaspiliopoulos
Abstract:
We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iterative…
▽ More
We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.
△ Less
Submitted 27 January, 2012; v1 submitted 7 January, 2011;
originally announced January 2011.
-
Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"
Authors:
Simon Barthelme,
Magali Beffy,
Nicolas Chopin,
Arnaud Doucet,
Pierre Jacob,
Adam M. Johansen,
Jean-Michel Marin,
Christian P. Robert
Abstract:
This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.
This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.
△ Less
Submitted 3 November, 2010;
originally announced November 2010.
-
Using parallel computation to improve Independent Metropolis--Hastings based estimation
Authors:
Pierre Jacob,
Christian P. Robert,
Murray H. Smith
Abstract:
In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis--Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis--Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those impr…
▽ More
In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis--Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis--Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao--Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert (1996), Atchade and Perron (2005) and Douc and Robert (2010). We illustrate those improvements both on a toy normal example and on a classical probit regression model, but stress the fact that they are applicable in any case where the independent Metropolis-Hastings is applicable.
△ Less
Submitted 24 March, 2011; v1 submitted 8 October, 2010;
originally announced October 2010.
-
Free energy Sequential Monte Carlo, application to mixture modelling
Authors:
Nicolas Chopin,
Pierre Jacob
Abstract:
We introduce a new class of Sequential Monte Carlo (SMC) methods, which we call free energy SMC. This class is inspired by free energy methods, which originate from Physics, and where one samples from a biased distribution such that a given function $ξ(θ)$ of the state $θ$ is forced to be uniformly distributed over a given interval. From an initial sequence of distributions $(π_t)$ of interest, an…
▽ More
We introduce a new class of Sequential Monte Carlo (SMC) methods, which we call free energy SMC. This class is inspired by free energy methods, which originate from Physics, and where one samples from a biased distribution such that a given function $ξ(θ)$ of the state $θ$ is forced to be uniformly distributed over a given interval. From an initial sequence of distributions $(π_t)$ of interest, and a particular choice of $ξ(θ)$, a free energy SMC sampler computes sequentially a sequence of biased distributions $(\tildeπ_{t})$ with the following properties: (a) the marginal distribution of $ξ(θ)$ with respect to $\tildeπ_{t}$ is approximatively uniform over a specified interval, and (b) $\tildeπ_{t}$ and $π_{t}$ have the same conditional distribution with respect to $ξ$. We apply our methodology to mixture posterior distributions, which are highly multimodal. In the mixture context, forcing certain hyper-parameters to higher values greatly faciliates mode swapping, and makes it possible to recover a symetric output. We illustrate our approach with univariate and bivariate Gaussian mixtures and two real-world datasets.
△ Less
Submitted 15 June, 2010;
originally announced June 2010.
-
Comments on "Particle Markov chain Monte Carlo" by C. Andrieu, A. Doucet, and R. Hollenstein
Authors:
Pierre Jacob,
Nicolas Chopin,
Christian P. Robert,
Havard Rue
Abstract:
This is the compilation of our comments submitted to the Journal of the Royal Statistical Society, Series B, to be published within the discussion of the Read Paper of Andrieu, Doucet and Hollenstein.
This is the compilation of our comments submitted to the Journal of the Royal Statistical Society, Series B, to be published within the discussion of the Read Paper of Andrieu, Doucet and Hollenstein.
△ Less
Submitted 5 November, 2009;
originally announced November 2009.