-
Permutations accelerate Approximate Bayesian Computation
Authors:
Antoine Luciano,
Charly Andral,
Christian P. Robert,
Robin J. Ryder
Abstract:
Approximate Bayesian Computation (ABC) methods have become essential tools for performing inference when likelihood functions are intractable or computationally prohibitive. However, their scalability remains a major challenge in hierarchical or high-dimensional models. In this paper, we introduce permABC, a new ABC framework designed for settings with both global and local parameters, where obser…
▽ More
Approximate Bayesian Computation (ABC) methods have become essential tools for performing inference when likelihood functions are intractable or computationally prohibitive. However, their scalability remains a major challenge in hierarchical or high-dimensional models. In this paper, we introduce permABC, a new ABC framework designed for settings with both global and local parameters, where observations are grouped into exchangeable compartments.
Building upon the Sequential Monte Carlo ABC (ABC-SMC) framework, permABC exploits the exchangeability of compartments through permutation-based matching, significantly improving computational efficiency.
We then develop two further, complementary sequential strategies: Over Sampling, which facilitates early-stage acceptance by temporarily increasing the number of simulated compartments, and Under Matching, which relaxes the acceptance condition by matching only subsets of the data.
These techniques allow for robust and scalable inference even in high-dimensional regimes. Through synthetic and real-world experiments -- including a hierarchical Susceptible-Infectious-Recover model of the early COVID-19 epidemic across 94 French departments -- we demonstrate the practical gains in accuracy and efficiency achieved by our approach.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
The Causal-Noncausal Tail Processes: An Introduction
Authors:
Christian Gouriéroux,
Yang Lu,
Christian-Yann Robert
Abstract:
This paper considers one-dimensional mixed causal/noncausal autoregressive (MAR) processes with heavy tail, usually introduced to model trajectories with patterns including asymmetric peaks and throughs, speculative bubbles, flash crashes, or jumps. We especially focus on the extremal behaviour of these processes when at a given date the process is above a large threshold and emphasize the roles o…
▽ More
This paper considers one-dimensional mixed causal/noncausal autoregressive (MAR) processes with heavy tail, usually introduced to model trajectories with patterns including asymmetric peaks and throughs, speculative bubbles, flash crashes, or jumps. We especially focus on the extremal behaviour of these processes when at a given date the process is above a large threshold and emphasize the roles of pure causal and noncausal components of the tail process. We provide the dynamic of the tail process and explain how it can be updated during the life of a speculative bubble. In particular we discuss the prediction of the turning point(s) and introduce pure residual plots as a diagnostic for the bubble episodes.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Peer-to-Peer Basis Risk Management for Renewable Production Parametric Insurance
Authors:
Fallou Niakh,
Alicia Bassière,
Michel Denuit,
Christian Robert
Abstract:
The financial viability of renewable energy projects is challenged by the variability and unpredictability of production due to weather fluctuations. This paper proposes a novel risk management framework combining parametric insurance and peer-to-peer (P2P) risk sharing to address production uncertainty in solar electricity generation. We first design a weather-based parametric insurance scheme to…
▽ More
The financial viability of renewable energy projects is challenged by the variability and unpredictability of production due to weather fluctuations. This paper proposes a novel risk management framework combining parametric insurance and peer-to-peer (P2P) risk sharing to address production uncertainty in solar electricity generation. We first design a weather-based parametric insurance scheme to protect against forecast errors, recalibrated at the site level to mitigate geographical basis risk. To handle residual mismatches between insurance payouts and actual losses, we introduce a complementary P2P mechanism that redistributes the remaining basis risk among participants. The method leverages physically based simulation models to reconstruct day-ahead forecasts and realized productions, integrating climate data and solar farm characteristics. A second-order theoretical approximation links heterogeneous local models to a shared weather index, making risk sharing operationally feasible. In an empirical application to 50 German solar farms, our approach reduces the volatility of production losses by 55\%, demonstrating its potential to stabilize revenues and strengthen the resilience of renewable investments.
△ Less
Submitted 27 April, 2025; v1 submitted 13 April, 2025;
originally announced April 2025.
-
Generalized Bayesian deep reinforcement learning
Authors:
Shreya Sinha Roy,
Richard G. Everitt,
Christian P. Robert,
Ritabrata Dutta
Abstract:
Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. As a model-based RL method, it has two key components: (1) inferring the posterior distribution of the model for the data-generating process (DGP) and (2) policy learning using the learned posterior. We propose to model th…
▽ More
Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. As a model-based RL method, it has two key components: (1) inferring the posterior distribution of the model for the data-generating process (DGP) and (2) policy learning using the learned posterior. We propose to model the dynamics of the unknown environment through deep generative models, assuming Markov dependence. In the absence of likelihood functions for these models, we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. We used sequential Monte Carlo (SMC) samplers to draw samples from this generalized Bayesian posterior distribution. In conjunction, to achieve scalability in the high-dimensional parameter space of the neural networks, we use the gradient-based Markov kernels within SMC. To justify the use of the prequential scoring rule posterior, we prove a Bernstein-von Mises-type theorem. For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximising the expected value function with respect to the posterior distribution. This improves upon traditional Thompson sampling (TS) and its extensions, which utilize only one sample drawn from the posterior distribution. This improvement is studied both theoretically and using simulation studies, assuming a discrete action space. Finally, we successfully extended our setup for a challenging problem with a continuous action space without theoretical guarantees.
△ Less
Submitted 2 June, 2025; v1 submitted 16 December, 2024;
originally announced December 2024.
-
Forecasting with Markovian max-stable fields in space and time: An application to wind gust speeds
Authors:
Ryan Cotsakis,
Erwan Koch,
Christian-Yann Robert
Abstract:
Hourly maxima of 3-second wind gust speeds are prominent indicators of the severity of wind storms, and accurately forecasting them is thus essential for populations, civil authorities and insurance companies. Space-time max-stable models appear as natural candidates for this, but those explored so far are not suited for forecasting and, more generally, the forecasting literature for max-stable fi…
▽ More
Hourly maxima of 3-second wind gust speeds are prominent indicators of the severity of wind storms, and accurately forecasting them is thus essential for populations, civil authorities and insurance companies. Space-time max-stable models appear as natural candidates for this, but those explored so far are not suited for forecasting and, more generally, the forecasting literature for max-stable fields is limited. To fill this gap, we consider a specific space-time max-stable model, more precisely a max-autoregressive model with advection, that is well-adapted to model and forecast atmospheric variables. We apply it, as well as our related forecasting strategy, to reanalysis 3-second wind gust data for France in 1999, and show good performance compared to a competitor model. On top of demonstrating the practical relevance of our model, we meticulously study its theoretical properties and show the consistency and asymptotic normality of the space-time pairwise likelihood estimator which is used to calibrate the model.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
On integral priors for multiple comparison in Bayesian model selection
Authors:
Diego Salmerón,
Juan Antonio Cano,
Christian P. Robert
Abstract:
Noninformative priors constructed for estimation purposes are usually not appropriate for model selection and testing. The methodology of integral priors was developed to get prior distributions for Bayesian model selection when comparing two models, modifying initial improper reference priors. We propose a generalization of this methodology to more than two models. Our approach adds an artificial…
▽ More
Noninformative priors constructed for estimation purposes are usually not appropriate for model selection and testing. The methodology of integral priors was developed to get prior distributions for Bayesian model selection when comparing two models, modifying initial improper reference priors. We propose a generalization of this methodology to more than two models. Our approach adds an artificial copy of each model under comparison by compactifying the parametric space and creating an ergodic Markov chain across all models that returns the integral priors as marginals of the stationary distribution. Besides the guarantee of their existence and the lack of paradoxes attached to estimation reference priors, an additional advantage of this methodology is that the simulation of this Markov chain is straightforward as it only requires simulations of imaginary training samples for all models and from the corresponding posterior distributions. This renders its implementation automatic and generic, both in the nested and in the non-nested cases. We present some examples, including situations where other methodologies need specific adjustments or do not produce a satisfactory answer.
△ Less
Submitted 16 June, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
A discussion of the paper "Safe testing" by Grünwald, de Heide, and Koolen
Authors:
Joshua Bon,
Christian P Robert
Abstract:
This is a discussion of the paper "Safe testing" by Grünwald, de Heide, and Koolen, Read before The Royal Statistical Society at a meeting organized by the Research Section on Wednesday, 24 January, 2024
This is a discussion of the paper "Safe testing" by Grünwald, de Heide, and Koolen, Read before The Royal Statistical Society at a meeting organized by the Research Section on Wednesday, 24 January, 2024
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Simulating signed mixtures
Authors:
Julien Stoehr,
Christian P. Robert
Abstract:
Simulating mixtures of distributions with signed weights proves a challenge as standard simulation algorithms are inefficient in handling the negative weights. In particular, the natural representation of mixture variates as associated with latent component indicators is no longer available. We propose here an exact accept-reject algorithm in the general case of finite signed mixtures that relies…
▽ More
Simulating mixtures of distributions with signed weights proves a challenge as standard simulation algorithms are inefficient in handling the negative weights. In particular, the natural representation of mixture variates as associated with latent component indicators is no longer available. We propose here an exact accept-reject algorithm in the general case of finite signed mixtures that relies on optimaly pairing positive and negative components and designing a stratified sampling scheme on pairs. We analyze the performances of our approach, relative to the inverse cdf approach, since the cdf of the distribution remains available for standard signed mixtures.
△ Less
Submitted 26 November, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Asymptotics of approximate Bayesian computation when summary statistics converge at heterogeneous rates
Authors:
Caroline Lawless,
Christian P. Robert,
Judith Rousseau,
Robin J. Ryder
Abstract:
We consider the asymptotic properties of Approximate Bayesian Computation (ABC) for the realistic case of summary statistics with heterogeneous rates of convergence. We allow some statistics to converge faster than the ABC tolerance, other statistics to converge slower, and cover the case where some statistics do not converge at all. We give conditions for the ABC posterior to converge, and provid…
▽ More
We consider the asymptotic properties of Approximate Bayesian Computation (ABC) for the realistic case of summary statistics with heterogeneous rates of convergence. We allow some statistics to converge faster than the ABC tolerance, other statistics to converge slower, and cover the case where some statistics do not converge at all. We give conditions for the ABC posterior to converge, and provide an explicit representation of the shape of the ABC posterior distribution in our general setting; in particular, we show how the shape of the posterior depends on the number of slow statistics. We then quantify the gain brought by the local linear post-processing step.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Insufficient Gibbs Sampling
Authors:
Antoine Luciano,
Christian P. Robert,
Robin J. Ryder
Abstract:
In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric frame…
▽ More
In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.
△ Less
Submitted 22 February, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Sampling using Adaptive Regenerative Processes
Authors:
Hector McKimm,
Andi Q Wang,
Murray Pollock,
Christian P Robert,
Gareth O Roberts
Abstract:
Enriching Brownian motion with regenerations from a fixed regeneration distribution $μ$ at a particular regeneration rate $κ$ results in a Markov process that has a target distribution $π$ as its invariant distribution. For the purpose of Monte Carlo inference, implementing such a scheme requires firstly selection of regeneration distribution $μ$, and secondly computation of a specific constant…
▽ More
Enriching Brownian motion with regenerations from a fixed regeneration distribution $μ$ at a particular regeneration rate $κ$ results in a Markov process that has a target distribution $π$ as its invariant distribution. For the purpose of Monte Carlo inference, implementing such a scheme requires firstly selection of regeneration distribution $μ$, and secondly computation of a specific constant $C$. Both of these tasks can be very difficult in practice for good performance. We introduce a method for adapting the regeneration distribution, by adding point masses to it. This allows the process to be simulated with as few regenerations as possible and obviates the need to find said constant $C$. Moreover, the choice of fixed $μ$ is replaced with the choice of the initial regeneration distribution, which is considerably less difficult. We establish convergence of this resulting self-reinforcing process and explore its effectiveness at sampling from a number of target distributions. The examples show that adapting the regeneration distribution guards against poor choices of fixed regeneration distribution and can reduce the error of Monte Carlo estimates of expectations of interest, especially when $π$ is skewed.
△ Less
Submitted 20 February, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Computing Bayes: From Then 'Til Now'
Authors:
Gael M. Martin,
David T. Frazier,
Christian P. Robert
Abstract:
This paper takes the reader on a journey through the history of Bayesian computation, from the 18th century to the present day. Beginning with the one-dimensional integral first confronted by Bayes in 1763, we highlight the key contributions of: Laplace, Metropolis (and, importantly, his co-authors!), Hammersley and Handscomb, and Hastings, all of which set the foundations for the computational re…
▽ More
This paper takes the reader on a journey through the history of Bayesian computation, from the 18th century to the present day. Beginning with the one-dimensional integral first confronted by Bayes in 1763, we highlight the key contributions of: Laplace, Metropolis (and, importantly, his co-authors!), Hammersley and Handscomb, and Hastings, all of which set the foundations for the computational revolution in the late 20th century -- led, primarily, by Markov chain Monte Carlo (MCMC) algorithms. A very short outline of 21st century computational methods -- including pseudo-marginal MCMC, Hamiltonian Monte Carlo, sequential Monte Carlo, and the various `approximate' methods -- completes the paper.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
The Importance Markov Chain
Authors:
Charly Andral,
Randal Douc,
Hugo Marival,
Christian P. Robert
Abstract:
The Importance Markov chain is a novel algorithm bridging the gap between rejection sampling and importance sampling, moving from one to the other through a tuning parameter. Based on a modified sample of an instrumental Markov chain targeting an instrumental distribution (typically via a MCMC kernel), the Importance Markov chain produces an extended Markov chain where the marginal distribution of…
▽ More
The Importance Markov chain is a novel algorithm bridging the gap between rejection sampling and importance sampling, moving from one to the other through a tuning parameter. Based on a modified sample of an instrumental Markov chain targeting an instrumental distribution (typically via a MCMC kernel), the Importance Markov chain produces an extended Markov chain where the marginal distribution of the first component converges to the target distribution. For example, when targeting a multimodal distribution, the instrumental distribution can be chosen as a tempered version of the target which allows the algorithm to explore its modes more efficiently. We obtain a Law of Large Numbers and a Central Limit Theorem as well as geometric ergodicity for this extended kernel under mild assumptions on the instrumental kernel. Computationally, the algorithm is easy to implement and preexisting libraries can be used to sample from the instrumental distribution.
△ Less
Submitted 26 February, 2024; v1 submitted 17 July, 2022;
originally announced July 2022.
-
50 shades of Bayesian testing of hypotheses
Authors:
Christian P Robert
Abstract:
Hypothesis testing and model choice are quintessential questions for statistical inference and while the Bayesian paradigm seems ideally suited for answering these questions, it faces difficulties of its own ranging from prior modelling to calibration, to numerical implementation. This c
Hypothesis testing and model choice are quintessential questions for statistical inference and while the Bayesian paradigm seems ideally suited for answering these questions, it faces difficulties of its own ranging from prior modelling to calibration, to numerical implementation. This c
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Evidence estimation in finite and infinite mixture models and applications
Authors:
Adrien Hairault,
Christian P. Robert,
Judith Rousseau
Abstract:
Estimating the model evidence - or mariginal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we reexamine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numer…
▽ More
Estimating the model evidence - or mariginal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we reexamine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric 'strongly identifiable' Dirichlet Process Mixture (DPM) model.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Approximating Bayes in the 21st Century
Authors:
Gael M. Martin,
David T. Frazier,
Christian P. Robert
Abstract:
The 21st century has seen an enormous growth in the development and use of approximate Bayesian methods. Such methods produce computational solutions to certain intractable statistical problems that challenge exact methods like Markov chain Monte Carlo: for instance, models with unavailable likelihoods, high-dimensional models, and models featuring large data sets. These approximate methods are th…
▽ More
The 21st century has seen an enormous growth in the development and use of approximate Bayesian methods. Such methods produce computational solutions to certain intractable statistical problems that challenge exact methods like Markov chain Monte Carlo: for instance, models with unavailable likelihoods, high-dimensional models, and models featuring large data sets. These approximate methods are the subject of this review. The aim is to help new researchers in particular -- and more generally those interested in adopting a Bayesian approach to empirical work -- distinguish between different approximate techniques; understand the sense in which they are approximate; appreciate when and why particular methods are useful; and see the ways in which they can can be combined.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Living on the Edge: An Unified Approach to Antithetic Sampling
Authors:
Roberto Casarin,
Radu V. Craiu,
Lorenzo Frattarolo,
Christian P. Robert
Abstract:
We identify recurrent ingredients in the antithetic sampling literature leading to a unified sampling framework. We introduce a new class of antithetic schemes that includes the most used antithetic proposals. This perspective enables the derivation of new properties of the sampling schemes: i) optimality in the Kullback-Leibler sense; ii) closed-form multivariate Kendall's $τ$ and Spearman's $ρ$;…
▽ More
We identify recurrent ingredients in the antithetic sampling literature leading to a unified sampling framework. We introduce a new class of antithetic schemes that includes the most used antithetic proposals. This perspective enables the derivation of new properties of the sampling schemes: i) optimality in the Kullback-Leibler sense; ii) closed-form multivariate Kendall's $τ$ and Spearman's $ρ$; iii)ranking in concordance order and iv) a central limit theorem that characterizes stochastic behavior of Monte Carlo estimators when the sample size tends to infinity. Finally, we provide applications to Monte Carlo integration and Markov Chain Monte Carlo Bayesian estimation.
△ Less
Submitted 6 December, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
NEO: Non Equilibrium Sampling on the Orbit of a Deterministic Transform
Authors:
Achille Thin,
Yazid Janati,
Sylvain Le Corff,
Charles Ollion,
Arnaud Doucet,
Alain Durmus,
Eric Moulines,
Christian Robert
Abstract:
Sampling from a complex distribution $π$ and approximating its intractable normalizing constant Z are challenging problems. In this paper, a novel family of importance samplers (IS) and Markov chain Monte Carlo (MCMC) samplers is derived. Given an invertible map T, these schemes combine (with weights) elements from the forward and backward Orbits through points sampled from a proposal distributi…
▽ More
Sampling from a complex distribution $π$ and approximating its intractable normalizing constant Z are challenging problems. In this paper, a novel family of importance samplers (IS) and Markov chain Monte Carlo (MCMC) samplers is derived. Given an invertible map T, these schemes combine (with weights) elements from the forward and backward Orbits through points sampled from a proposal distribution $ρ$. The map T does not leave the target $π$ invariant, hence the name NEO, standing for Non-Equilibrium Orbits. NEO-IS provides unbiased estimators of the normalizing constant and self-normalized IS estimators of expectations under $π$ while NEO-MCMC combines multiple NEO-IS estimates of the normalizing constant and an iterated sampling-importance resampling mechanism to sample from $π$. For T chosen as a discrete-time integrator of a conformal Hamiltonian system, NEO-IS achieves state-of-the art performance on difficult benchmarks and NEO-MCMC is able to explore highly multimodal targets. Additionally, we provide detailed theoretical results for both methods. In particular, we show that NEO-MCMC is uniformly geometrically ergodic and establish explicit mixing time estimates under mild conditions.
△ Less
Submitted 23 August, 2021; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Rao-Blackwellization in the MCMC era
Authors:
Christian P. Robert,
Gareth O. Roberts
Abstract:
Rao-Blackwellization is a notion often occurring in the MCMC literature, with possibly different meanings and connections with the original Rao--Blackwell theorem (Rao, 1945 and Blackwell,1947), including a reduction of the variance of the resulting Monte Carlo approximations. This survey reviews some of the meanings of the term.
Rao-Blackwellization is a notion often occurring in the MCMC literature, with possibly different meanings and connections with the original Rao--Blackwell theorem (Rao, 1945 and Blackwell,1947), including a reduction of the variance of the resulting Monte Carlo approximations. This survey reviews some of the meanings of the term.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Computing Bayes: Bayesian Computation from 1763 to the 21st Century
Authors:
Gael M. Martin,
David T. Frazier,
Christian P. Robert
Abstract:
The Bayesian statistical paradigm uses the language of probability to express uncertainty about the phenomena that generate observed data. Probability distributions thus characterize Bayesian analysis, with the rules of probability used to transform prior probability distributions for all unknowns - parameters, latent variables, models - into posterior distributions, subsequent to the observation…
▽ More
The Bayesian statistical paradigm uses the language of probability to express uncertainty about the phenomena that generate observed data. Probability distributions thus characterize Bayesian analysis, with the rules of probability used to transform prior probability distributions for all unknowns - parameters, latent variables, models - into posterior distributions, subsequent to the observation of data. Conducting Bayesian analysis requires the evaluation of integrals in which these probability distributions appear. Bayesian computation is all about evaluating such integrals in the typical case where no analytical solution exists. This paper takes the reader on a chronological tour of Bayesian computation over the past two and a half centuries. Beginning with the one-dimensional integral first confronted by Bayes in 1763, through to recent problems in which the unknowns number in the millions, we place all computational problems into a common framework, and describe all computational methods using a common notation. The aim is to help new researchers in particular - and more generally those interested in adopting a Bayesian approach to empirical work - make sense of the plethora of computational techniques that are now on offer; understand when and why different methods are useful; and see the links that do exist, between them all.
△ Less
Submitted 5 December, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Generalized Poisson Difference Autoregressive Processes
Authors:
Giulia Carallo,
Roberto Casarin,
Christian P. Robert
Abstract:
This paper introduces a new stochastic process with values in the set Z of integers with sign. The increments of process are Poisson differences and the dynamics has an autoregressive structure. We study the properties of the process and exploit the thinning representation to derive stationarity conditions and the stationary distribution of the process. We provide a Bayesian inference method and a…
▽ More
This paper introduces a new stochastic process with values in the set Z of integers with sign. The increments of process are Poisson differences and the dynamics has an autoregressive structure. We study the properties of the process and exploit the thinning representation to derive stationarity conditions and the stationary distribution of the process. We provide a Bayesian inference method and an efficient posterior approximation procedure based on Monte Carlo. Numerical illustrations on both simulated and real data show the effectiveness of the proposed inference.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Markov Chain Monte Carlo Methods, a survey with some frequent misunderstandings
Authors:
Christian P. Robert,
Wu Changye
Abstract:
In this chapter, we review some of the most standard MCMC tools used in Bayesian computation, along with vignettes on standard misunderstandings of these approaches taken from Q \&~A's on the forum Cross-validated answered by the first author.
In this chapter, we review some of the most standard MCMC tools used in Bayesian computation, along with vignettes on standard misunderstandings of these approaches taken from Q \&~A's on the forum Cross-validated answered by the first author.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Parallelising MCMC via Random Forests
Authors:
Wu Changye,
Christian P. Robert
Abstract:
For Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits the whole data set into batches, runs MCMC algorithms separately over each batch to produce samples of parameters, and combines them to produce an approximation of the target distribution. In this article, we embed random forests into this framework and use each subposterior/partial-posterior as a proposal di…
▽ More
For Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits the whole data set into batches, runs MCMC algorithms separately over each batch to produce samples of parameters, and combines them to produce an approximation of the target distribution. In this article, we embed random forests into this framework and use each subposterior/partial-posterior as a proposal distribution to implement importance sampling. Unlike the existing divide-and-conquer MCMC, our methods are based on scaled subposteriors, whose scale factors are not necessarily restricted to being equal to one or to the number of subsets. Through several experiments, we show that our methods work well with models ranging from Gaussian cases to strongly non-Gaussian cases, and include model misspecification.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Component-wise approximate Bayesian computation via Gibbs-like steps
Authors:
Grégoire Clarté,
Christian P. Robert,
Robin Ryder,
Julien Stoehr
Abstract:
Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at th…
▽ More
Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.
△ Less
Submitted 17 September, 2020; v1 submitted 31 May, 2019;
originally announced May 2019.
-
Many perspectives on Deborah Mayo's "Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars"
Authors:
Andrew Gelman,
Brian Haig,
Christian Hennig,
Art Owen,
Robert Cousins,
Stan Young,
Christian Robert,
Corey Yanofsky,
E. J. Wagenmakers,
Ron Kenett,
Daniel Lakeland
Abstract:
The new book by philosopher Deborah Mayo is relevant to data science for topical reasons, as she takes various controversial positions regarding hypothesis testing and statistical practice, and also as an entry point to thinking about the philosophy of statistics. The present article is a slightly expanded version of a series of informal reviews and comments on Mayo's book. We hope this discussion…
▽ More
The new book by philosopher Deborah Mayo is relevant to data science for topical reasons, as she takes various controversial positions regarding hypothesis testing and statistical practice, and also as an entry point to thinking about the philosophy of statistics. The present article is a slightly expanded version of a series of informal reviews and comments on Mayo's book. We hope this discussion will introduce people to Mayo's ideas along with other perspectives on the topics she addresses.
△ Less
Submitted 29 May, 2019; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Approximate Bayesian computation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and…
▽ More
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Model Selection for Mixture Models - Perspectives and Strategies
Authors:
Gilles Celeux,
Sylvia Fruewirth-Schnatter,
Christian P. Robert
Abstract:
Determining the number G of components in a finite mixture distribution is an important and difficult inference issue. This is a most important question, because statistical inference about the resulting model is highly sensitive to the value of G. Selecting an erroneous value of G may produce a poor density estimate. This is also a most difficult question from a theoretical perspective as it rela…
▽ More
Determining the number G of components in a finite mixture distribution is an important and difficult inference issue. This is a most important question, because statistical inference about the resulting model is highly sensitive to the value of G. Selecting an erroneous value of G may produce a poor density estimate. This is also a most difficult question from a theoretical perspective as it relates to unidentifiability issues of the mixture model. This is further a most relevant question from a practical viewpoint since the meaning of the number of components G is strongly related to the modelling purpose of a mixture distribution. We distinguish in this chapter between selecting G as a density estimation problem in Section 2 and selecting G in a model-based clustering framework in Section 3. Both sections discuss frequentist as well as Bayesian approaches. We present here some of the Bayesian solutions to the different interpretations of picking the "right" number of components in a mixture, before concluding on the ill-posed nature of the question.
△ Less
Submitted 24 December, 2018;
originally announced December 2018.
-
Computational Solutions for Bayesian Inference in Mixture Models
Authors:
Gilles Celeux,
Kaniav Kamary,
Gertraud Malsiner-Walli,
Jean-Michel Marin,
Christian P. Robert
Abstract:
This chapter surveys the most standard Monte Carlo methods available for simulating from a posterior distribution associated with a mixture and conducts some experiments about the robustness of the Gibbs sampler in high dimensional Gaussian settings. This is a chapter prepared for the forthcoming 'Handbook of Mixture Analysis'.
This chapter surveys the most standard Monte Carlo methods available for simulating from a posterior distribution associated with a mixture and conducts some experiments about the robustness of the Gibbs sampler in high dimensional Gaussian settings. This is a chapter prepared for the forthcoming 'Handbook of Mixture Analysis'.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Stochastic derivative estimation for max-stable random fields
Authors:
Erwan Koch,
Christian Y. Robert
Abstract:
We consider expected performances based on max-stable random fields and we are interested in their derivatives with respect to the spatial dependence parameters of those fields. Max-stable fields, such as the Brown--Resnick and Smith fields, are very popular in spatial extremes. We focus on the two most popular unbiased stochastic derivative estimation approaches: the likelihood ratio method (LRM)…
▽ More
We consider expected performances based on max-stable random fields and we are interested in their derivatives with respect to the spatial dependence parameters of those fields. Max-stable fields, such as the Brown--Resnick and Smith fields, are very popular in spatial extremes. We focus on the two most popular unbiased stochastic derivative estimation approaches: the likelihood ratio method (LRM) and the infinitesimal perturbation analysis (IPA). LRM requires the multivariate density of the max-stable field to be explicit, and IPA necessitates the computation of the derivative with respect to the parameters for each simulated value. We propose convenient and tractable conditions ensuring the validity of LRM and IPA in the cases of the Brown--Resnick and Smith field, respectively. Obtaining such conditions is intricate owing to the very structure of max-stable fields. Then we focus on risk and dependence measures, which constitute one of the several frameworks where our theoretical results can be useful. We perform a simulation study which shows that both LRM and IPA perform well in various configurations, and provide a real case study that is valuable for the insurance industry.
△ Less
Submitted 3 November, 2020; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Faster Hamiltonian Monte Carlo by Learning Leapfrog Scale
Authors:
Changye Wu,
Julien Stoehr,
Christian P. Robert
Abstract:
Hamiltonian Monte Carlo samplers have become standard algorithms for MCMC implementations, as opposed to more basic versions, but they still require some amount of tuning and calibration. Exploiting the U-turn criterion of the NUTS algorithm (Hoffman and Gelman, 2014), we propose a version of HMC that relies on the distribution of the integration time of the associated leapfrog integrator. Using i…
▽ More
Hamiltonian Monte Carlo samplers have become standard algorithms for MCMC implementations, as opposed to more basic versions, but they still require some amount of tuning and calibration. Exploiting the U-turn criterion of the NUTS algorithm (Hoffman and Gelman, 2014), we propose a version of HMC that relies on the distribution of the integration time of the associated leapfrog integrator. Using in addition the primal-dual averaging method for tuning the step size of the integrator, we achieve an essentially calibration free version of HMC. When compared with the original NUTS on several benchmarks, this algorithm exhibits a significantly improved efficiency.
△ Less
Submitted 27 February, 2019; v1 submitted 10 October, 2018;
originally announced October 2018.
-
Rethinking the Effective Sample Size
Authors:
Víctor Elvira,
Luca Martino,
Christian P. Robert
Abstract:
The effective sample size (ESS) is widely used in sample-based simulation methods for assessing the quality of a Monte Carlo approximation of a given distribution and of related integrals. In this paper, we revisit the approximation of the ESS in the specific context of importance sampling (IS). The derivation of this approximation, that we will denote as $\widehat{\text{ESS}}$, is partially avail…
▽ More
The effective sample size (ESS) is widely used in sample-based simulation methods for assessing the quality of a Monte Carlo approximation of a given distribution and of related integrals. In this paper, we revisit the approximation of the ESS in the specific context of importance sampling (IS). The derivation of this approximation, that we will denote as $\widehat{\text{ESS}}$, is partially available in Kong (1992). This approximation has been widely used in the last 25 years due to its simplicity as a practical rule of thumb in a wide variety of importance sampling methods. However, we show that the multiple assumptions and approximations in the derivation of $\widehat{\text{ESS}}$, makes it difficult to be considered even as a reasonable approximation of the ESS. We extend the discussion of the $\widehat{\text{ESS}}$ in the multiple importance sampling (MIS) setting, we display numerical examples, and we discuss several avenues for developing alternative metrics. This paper does not cover the use of ESS for MCMC algorithms.
△ Less
Submitted 31 March, 2022; v1 submitted 11 September, 2018;
originally announced September 2018.
-
The Coordinate Sampler: A Non-Reversible Gibbs-like MCMC Sampler
Authors:
Changye Wu,
Christian P. Robert
Abstract:
In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity…
▽ More
In this article, we derive a novel non-reversible, continuous-time Markov chain Monte Carlo (MCMC) sampler, called Coordinate Sampler, based on a piecewise deterministic Markov process (PDMP), which can be seen as a variant of the Zigzag sampler. In addition to proving a theoretical validation for this new sampling algorithm, we show that the Markov chain it induces exhibits geometrical ergodicity convergence, for distributions whose tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. Several numerical examples highlight that our coordinate sampler is more efficient than the Zigzag sampler, in terms of effective sample size.
△ Less
Submitted 11 April, 2019; v1 submitted 10 September, 2018;
originally announced September 2018.
-
Accelerating MCMC Algorithms
Authors:
Christian P. Robert,
Victor Elvira,
Nick Tawn,
Changye Wu
Abstract:
Markov chain Monte Carlo algorithms are used to simulate from complex statistical distributions by way of a local exploration of these distributions. This local feature avoids heavy requests on understanding the nature of the target, but it also potentially induces a lengthy exploration of this target, with a requirement on the number of simulations that grows with the dimension of the problem and…
▽ More
Markov chain Monte Carlo algorithms are used to simulate from complex statistical distributions by way of a local exploration of these distributions. This local feature avoids heavy requests on understanding the nature of the target, but it also potentially induces a lengthy exploration of this target, with a requirement on the number of simulations that grows with the dimension of the problem and with the complexity of the data behind it. Several techniques are available towards accelerating the convergence of these Monte Carlo algorithms, either at the exploration level (as in tempering, Hamiltonian Monte Carlo and partly deterministic methods) or at the exploitation level (with Rao-Blackwellisation and scalable methods).
△ Less
Submitted 11 April, 2018; v1 submitted 8 April, 2018;
originally announced April 2018.
-
Estimating causal effects of time-dependent exposures on a binary endpoint in a high-dimensional setting
Authors:
Vahé Asvatourian,
Clélia Coutzac,
Nathalie Chaput,
Caroline Robert,
Stefan Michiels,
Emilie Lanoy
Abstract:
Recently, the intervention calculus when the DAG is absent (IDA) method was developed to estimate lower bounds of causal effects from observational high-dimensional data. Originally it was introduced to assess the effect of baseline biomarkers which do not vary over time. However, in many clinical settings, measurements of biomarkers are repeated at fixed time points during treatment exposure and,…
▽ More
Recently, the intervention calculus when the DAG is absent (IDA) method was developed to estimate lower bounds of causal effects from observational high-dimensional data. Originally it was introduced to assess the effect of baseline biomarkers which do not vary over time. However, in many clinical settings, measurements of biomarkers are repeated at fixed time points during treatment exposure and, therefore, this method need to be extended. The purpose of this paper is then to extend the first step of the IDA, the Peter Clarks (PC)-algorithm, to a time-dependent exposure in the context of a binary outcome. We generalised the PC-algorithm for taking into account the chronological order of repeated measurements of the exposure and propose to apply the IDA with our new version, the chronologically ordered PC-algorithm (COPC-algorithm). A simulation study has been performed before applying the method for estimating causal effects of time-dependent immunological biomarkers on toxicity, death and progression in patients with metastatic melanoma. The simulation study showed that the completed partially directed acyclic graphs (CPDAGs) obtained using COPC-algorithm were structurally closer to the true CPDAG than CPDAGs obtained using PC-algorithm. Also, causal effects were more accurate when they were estimated based on CPDAGs obtained using COPC-algorithm. Moreover, CPDAGs obtained by COPC-algorithm allowed removing non-chronologic arrows with a variable measured at a time t pointing to a variable measured at a time t' where t'< t. Bidirected edges were less present in CPDAGs obtained with the COPC-algorithm, supporting the fact that there was less variability in causal effects estimated from these CPDAGs. The COPC-algorithm provided CPDAGs that keep the chronological structure present in the data, thus allowed to estimate lower bounds of the causal effect of time-dependent biomarkers.
△ Less
Submitted 29 March, 2018; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Approximating the Likelihood in Approximate Bayesian Computation
Authors:
Christopher C Drovandi,
Clara Grazian,
Kerrie Mengersen,
Christian Robert
Abstract:
This chapter will appear in the forthcoming Handbook of Approximate Bayesian Computation (2018).
The conceptual and methodological framework that underpins approximate Bayesian computation (ABC) is targetted primarily towards problems in which the likelihood is either challenging or missing. ABC uses a simulation-based non-parametric estimate of the likelihood of a summary statistic and assumes…
▽ More
This chapter will appear in the forthcoming Handbook of Approximate Bayesian Computation (2018).
The conceptual and methodological framework that underpins approximate Bayesian computation (ABC) is targetted primarily towards problems in which the likelihood is either challenging or missing. ABC uses a simulation-based non-parametric estimate of the likelihood of a summary statistic and assumes that the generation of data from the model is computationally cheap. This chapter reviews two alternative approaches for estimating the intractable likelihood, with the goal of reducing the necessary model simulations to produce an approximate posterior. The first of these is a Bayesian version of the synthetic likelihood (SL), initially developed by Wood (2010), which uses a multivariate normal approximation to the summary statistic likelihood. Using the parametric approximation as opposed to the non-parametric approximation of ABC, it is possible to reduce the number of model simulations required. The second likelihood approximation method we consider in this chapter is based on the empirical likelihood (EL), which is a non-parametric technique and involves maximising a likelihood constructed empirically under a set of moment constraints. Mengersen et al (2013) adapt the EL framework so that it can be used to form an approximate posterior for problems where ABC can be applied, that is, for models with intractable likelihoods. However, unlike ABC and the Bayesian SL (BSL), the Bayesian EL (BCel) approach can be used to completely avoid model simulations in some cases. The BSL and BCel methods are illustrated on models of varying complexity.
△ Less
Submitted 18 March, 2018;
originally announced March 2018.
-
Abandon Statistical Significance
Authors:
Blakeley B. McShane,
David Gal,
Andrew Gelman,
Christian Robert,
Jennifer L. Tackett
Abstract:
We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the…
▽ More
We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm--and the p-value thresholds intrinsic to it--as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to "ban" p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.
△ Less
Submitted 8 September, 2018; v1 submitted 21 September, 2017;
originally announced September 2017.
-
Tail approximations for sums of dependent regularly varying random variables under Archimedean copula models
Authors:
Hélène Cossette,
Etienne Marceau,
Quang Huy Nguyen,
Christian Robert
Abstract:
In this paper, we compare two numerical methods for approximating the probability that the sum of dependent regularly varying random variables exceeds a high threshold under Archimedean copula models. The first method is based on conditional Monte Carlo. We present four estimators and show that most of them have bounded relative errors. The second method is based on analytical expressions of the m…
▽ More
In this paper, we compare two numerical methods for approximating the probability that the sum of dependent regularly varying random variables exceeds a high threshold under Archimedean copula models. The first method is based on conditional Monte Carlo. We present four estimators and show that most of them have bounded relative errors. The second method is based on analytical expressions of the multivariate survival or cumulative distribution functions of the regularly varying random variables and provides sharp and deterministic bounds of the probability of exceedance. We discuss implementation issues and illustrate the accuracy of both procedures through numerical studies.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Better together? Statistical learning in models made of modules
Authors:
Pierre E. Jacob,
Lawrence M. Murray,
Chris C. Holmes,
Christian P. Robert
Abstract:
In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected "modules", each relating to a specific data modality, and drawing on specific domain expertise in their developmen…
▽ More
In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference, prediction, or decision problem. In such circumstances, it is convenient to use a graphical model to represent the statistical dependencies, via a set of connected "modules", each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the estimate and update of others, often in unpredictable ways. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of approaches that restrict the information propagation between modules, for example by restricting propagation to only particular directions along the edges of the graph. In this article, we investigate why these modular approaches might be preferable to the full model in misspecified settings. We propose principled criteria to choose between modular and full-model approaches. The question arises in many applied settings, including large stochastic dynamical systems, meta-analysis, epidemiological models, air pollution models, pharmacokinetics-pharmacodynamics, and causal inference with propensity scores.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Model Misspecification in ABC: Consequences and Diagnostics
Authors:
David T. Frazier,
Christian P. Robert,
Judith Rousseau
Abstract:
We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different result…
▽ More
We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different results. Our theoretical results demonstrate that even though the model is misspecified, under regularity conditions, the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, under model misspecification the ABC posterior does not yield credible sets with valid frequentist coverage and has non-standard asymptotic behavior. In addition, we examine the theoretical behavior of the popular local regression adjustment to ABC under model misspecification and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than accept/reject ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.
△ Less
Submitted 9 July, 2019; v1 submitted 6 August, 2017;
originally announced August 2017.
-
Generalized Bouncy Particle Sampler
Authors:
Changye Wu,
Christian P. Robert
Abstract:
As a special example of piecewise deterministic Markov process, bouncy particle sampler is a rejection-free, irreversible Markov chain Monte Carlo algorithm and can draw samples from target distribution efficiently. We generalize bouncy particle sampler in terms of its transition dynamics. In BPS, the transition dynamic at event time is deterministic, but in GBPS, it is random. With the help of th…
▽ More
As a special example of piecewise deterministic Markov process, bouncy particle sampler is a rejection-free, irreversible Markov chain Monte Carlo algorithm and can draw samples from target distribution efficiently. We generalize bouncy particle sampler in terms of its transition dynamics. In BPS, the transition dynamic at event time is deterministic, but in GBPS, it is random. With the help of this randomness, GBPS can overcome the reducibility problem in BPS without refreshement.
△ Less
Submitted 18 June, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.
-
Average of Recentered Parallel MCMC for Big Data
Authors:
Changye Wu,
Christian P. Robert
Abstract:
In big data context, traditional MCMC methods, such as Metropolis-Hastings algorithms and hybrid Monte Carlo, scale poorly because of their need to evaluate the likelihood over the whole data set at each iteration. In order to resurrect MCMC methods, numerous approaches belonging to two categories: divide-and-conquer and subsampling, are proposed. In this article, we study the parallel MCMC and pr…
▽ More
In big data context, traditional MCMC methods, such as Metropolis-Hastings algorithms and hybrid Monte Carlo, scale poorly because of their need to evaluate the likelihood over the whole data set at each iteration. In order to resurrect MCMC methods, numerous approaches belonging to two categories: divide-and-conquer and subsampling, are proposed. In this article, we study the parallel MCMC and propose a new combination method in the divide-and-conquer framework. Compared with some parallel MCMC methods, such as consensus Monte Carlo, Weierstrass Sampler, instead of sampling from subposteriors, our method runs MCMC on rescaled subposteriors, but share the same computation cost in the parallel stage. We also give the mathematical justification of our method and show its performance in several models. Besides, even though our new methods is proposed in parametric framework, it can been applied to non-parametric cases without difficulty.
△ Less
Submitted 18 June, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.
-
Jeffreys priors for mixture estimation: properties and alternatives
Authors:
Clara Grazian,
Christian P. Robert
Abstract:
While Jeffreys priors usually are well-defined for the parameters of mixtures of distributions, they are not available in closed form. Furthermore, they often are improper priors. Hence, they have never been used to draw inference on the mixture parameters. The implementation and the properties of Jeffreys priors in several mixture settings are studied. It is shown that the associated posterior di…
▽ More
While Jeffreys priors usually are well-defined for the parameters of mixtures of distributions, they are not available in closed form. Furthermore, they often are improper priors. Hence, they have never been used to draw inference on the mixture parameters. The implementation and the properties of Jeffreys priors in several mixture settings are studied. It is shown that the associated posterior distributions most often are improper. Nevertheless, the Jeffreys prior for the mixture weights conditionally on the parameters of the mixture components will be shown to have the property of conservativeness with respect to the number of components, in case of overfitted mixture and it can be therefore used as a default priors in this context.
△ Less
Submitted 12 December, 2017; v1 submitted 6 June, 2017;
originally announced June 2017.
-
Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig
Authors:
Gilles Celeux,
Jack Jewson,
Julie Josse,
Jean-Michel Marin,
Christian P. Robert
Abstract:
This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A.
This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
On parameter estimation with the Wasserstein distance
Authors:
Espen Bernton,
Pierre E. Jacob,
Mathieu Gerber,
Christian P. Robert
Abstract:
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which th…
▽ More
Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the approximation of these estimators and illustrate their behavior in several numerical experiments. Two of our examples are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.
△ Less
Submitted 9 May, 2019; v1 submitted 18 January, 2017;
originally announced January 2017.
-
Some comments about A Bayesian criterion for singular models by M. Drton and M. Plummer
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
These are written comments about the Read Paper A Bayesian criterion for singular models by M. Drton and M. Plummer, read to the Royal Statistical Society on October 5, 2016. The discussion was delivered by Judith Rousseau.
These are written comments about the Read Paper A Bayesian criterion for singular models by M. Drton and M. Plummer, read to the Royal Statistical Society on October 5, 2016. The discussion was delivered by Judith Rousseau.
△ Less
Submitted 8 October, 2016;
originally announced October 2016.
-
Some comments about "Penalising model component complexity" by Simpson et al. (2017)
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
This note discusses the paper "Penalising model component complexity" by Simpson et al. (2017). While we acknowledge the highly novel approach to prior construction and commend the authors for setting new-encompassing principles that will Bayesian modelling, and while we perceive the potential connection with other branches of the literature, we remain uncertain as to what extent the principles ex…
▽ More
This note discusses the paper "Penalising model component complexity" by Simpson et al. (2017). While we acknowledge the highly novel approach to prior construction and commend the authors for setting new-encompassing principles that will Bayesian modelling, and while we perceive the potential connection with other branches of the literature, we remain uncertain as to what extent the principles exposed in the paper can be developed outside specific models, given their lack of precision. The very notions of model component, base model, overfitting prior are for instance conceptual rather than mathematical and we thus fear the concept of penalised complexity may not further than extending first-guess priors into larger families, thus failing to establish reference priors on a novel sound ground.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Asymptotic Properties of Approximate Bayesian Computation
Authors:
David T. Frazier,
Gael M. Martin,
Christian P. Robert,
Judith Rousseau
Abstract:
Approximate Bayesian computation allows for statistical analysis in models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, its limiting shape, and the asymptotic distribution of the poster…
▽ More
Approximate Bayesian computation allows for statistical analysis in models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, its limiting shape, and the asymptotic distribution of the posterior mean. These results hold under given rates for the tolerance used within the method, mild regularity conditions on the summary statistics, and a condition linked to identification of the true parameters. Implications for practitioners are discussed.
△ Less
Submitted 8 May, 2018; v1 submitted 23 July, 2016;
originally announced July 2016.
-
ABC random forests for Bayesian parameter inference
Authors:
Louis Raynal,
Jean-Michel Marin,
Pierre Pudlo,
Mathieu Ribatet,
Christian P. Robert,
Arnaud Estoup
Abstract:
This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative stat…
▽ More
This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.
△ Less
Submitted 2 November, 2018; v1 submitted 18 May, 2016;
originally announced May 2016.
-
Auxiliary Likelihood-Based Approximate Bayesian Computation in State Space Models
Authors:
Gael M. Martin,
Brendan P. M. McCabe,
David T. Frazier,
Worapree Maneesoonthorn,
Christian P. Robert
Abstract:
A computationally simple approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics for the observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a 'match' between observed and simulated…
▽ More
A computationally simple approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics for the observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a 'match' between observed and simulated summaries are retained, and used to estimate the inaccessible posterior. With no reduction to a low-dimensional set of sufficient statistics being possible in the state space setting, we define the summaries as the maximum of an auxiliary likelihood function, and thereby exploit the asymptotic sufficiency of this estimator for the auxiliary parameter vector. We derive conditions under which this approach - including a computationally efficient version based on the auxiliary score - achieves Bayesian consistency. To reduce the well-documented inaccuracy of ABC in multi-parameter settings, we propose the separate treatment of each parameter dimension using an integrated likelihood technique. Three stochastic volatility models for which exact Bayesian inference is either computationally challenging, or infeasible, are used for illustration. We demonstrate that our approach compares favorably against an extensive set of approximate and exact comparators. An empirical illustration completes the paper.
△ Less
Submitted 2 December, 2018; v1 submitted 27 April, 2016;
originally announced April 2016.