-
Estimating a graph's spectrum via random Kirchhoff forests
Authors:
Simon Barthelmé,
Fabienne Castell,
Alexandre Gaudillière,
Clothilde Melot,
Matteo Quattropani,
Nicolas Tremblay
Abstract:
Exact eigendecomposition of large matrices is very expensive, and it is practically impossible to compute exact eigenvalues. Instead, one may set a more modest goal of approaching the empirical distribution of the eigenvalues, recovering the overall shape of the eigenspectrum. Current approaches to spectral estimation typically work with \emph{moments} of the spectral distribution. These moments a…
▽ More
Exact eigendecomposition of large matrices is very expensive, and it is practically impossible to compute exact eigenvalues. Instead, one may set a more modest goal of approaching the empirical distribution of the eigenvalues, recovering the overall shape of the eigenspectrum. Current approaches to spectral estimation typically work with \emph{moments} of the spectral distribution. These moments are first estimated using Monte Carlo trace estimators, then the estimates are combined to approximate the spectral density. In this article we show how \emph{Kirchhoff forests}, which are random forests on graphs, can be used to estimate certain non-linear moments of very large graph Laplacians. We show how to combine these moments into an estimate of the spectral density. If the estimate's desired precision isn't too high, our approach paves the way to the estimation of a graph's spectrum in time sublinear in the number of links.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Least squares variational inference
Authors:
Yvann Le Fay,
Nicolas Chopin,
Simon Barthelmé
Abstract:
Variational inference consists in finding the best approximation of a target distribution within a certain family, where `best' means (typically) smallest Kullback-Leiber divergence. We show that, when the approximation family is exponential, the best approximation is the solution of a fixed-point equation. We introduce LSVI (Least-Squares Variational Inference), a Monte Carlo variant of the corre…
▽ More
Variational inference consists in finding the best approximation of a target distribution within a certain family, where `best' means (typically) smallest Kullback-Leiber divergence. We show that, when the approximation family is exponential, the best approximation is the solution of a fixed-point equation. We introduce LSVI (Least-Squares Variational Inference), a Monte Carlo variant of the corresponding fixed-point recursion, where each iteration boils down to ordinary least squares regression and does not require computing gradients. We show that LSVI is equivalent to stochastic mirror descent; we use this insight to derive convergence guarantees. We introduce various ideas to improve LSVI further when the approximation family is Gaussian, leading to a $O(d^3)$ complexity in the dimension $d$ of the target in the full-covariance case, and a $O(d)$ complexity in the mean-field case. We show that LSVI outperforms state-of-the-art methods in a range of examples, while remaining gradient-free, that is, it does not require computing gradients.
△ Less
Submitted 28 February, 2025; v1 submitted 5 February, 2025;
originally announced February 2025.
-
Node Regression on Latent Position Random Graphs via Local Averaging
Authors:
Martin Gjorgjevski,
Nicolas Keriven,
Simon Barthelmé,
Yohann De Castro
Abstract:
Node regression consists in predicting the value of a graph label at a node, given observations at the other nodes. To gain some insight into the performance of various estimators for this task, we perform a theoretical study in a context where the graph is random. Specifically, we assume that the graph is generated by a Latent Position Model, where each node of the graph has a latent position, an…
▽ More
Node regression consists in predicting the value of a graph label at a node, given observations at the other nodes. To gain some insight into the performance of various estimators for this task, we perform a theoretical study in a context where the graph is random. Specifically, we assume that the graph is generated by a Latent Position Model, where each node of the graph has a latent position, and the probability that two nodes are connected depend on the distance between the latent positions of the two nodes. In this context, we begin by studying the simplest possible estimator for graph regression, which consists in averaging the value of the label at all neighboring nodes. We show that in Latent Position Models this estimator tends to a Nadaraya Watson estimator in the latent space, and that its rate of convergence is in fact the same. One issue with this standard estimator is that it averages over a region consisting of all neighbors of a node, and that depending on the graph model this may be too much or too little. An alternative consists in first estimating the true distances between the latent positions, then injecting these estimated distances into a classical Nadaraya Watson estimator. This enables averaging in regions either smaller or larger than the typical graph neighborhood. We show that this method can achieve standard nonparametric rates in certain instances even when the graph neighborhood is too large or too small.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
A Faster Sampler for Discrete Determinantal Point Processes
Authors:
Simon Barthelmé,
Nicolas Tremblay,
Pierre-Olivier Amblard
Abstract:
Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as O(n^3) where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In s…
▽ More
Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as O(n^3) where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as O(np^2 + nm^2) where m is the (average) number of samples of the DPP (usually m << n) and p the rank of the kernel used to define the DPP (m \leq p \leq n). The first term, O(np^2), comes from a SVD-like step. We focus here on the second term of this cost, O(nm^2), and show that it can be brought down to O(nm + m^3 log m) without loss on the sampling's exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n > 1000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time O(m^3 log m). Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size O(m log m) formed using leverage score i.i.d. sampling.
△ Less
Submitted 22 February, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Smoothing complex-valued signals on Graphs with Monte-Carlo
Authors:
Hugo Jaquard,
Michaël Fanuel,
Pierre-Olivier Amblard,
Rémi Bardenet,
Simon Barthelmé,
Nicolas Tremblay
Abstract:
We introduce new smoothing estimators for complex signals on graphs, based on a recently studied Determinantal Point Process (DPP). These estimators are built from subsets of edges and nodes drawn according to this DPP, making up trees and unicycles, i.e., connected components containing exactly one cycle. We provide a Julia implementation of these estimators and study their performance when appli…
▽ More
We introduce new smoothing estimators for complex signals on graphs, based on a recently studied Determinantal Point Process (DPP). These estimators are built from subsets of edges and nodes drawn according to this DPP, making up trees and unicycles, i.e., connected components containing exactly one cycle. We provide a Julia implementation of these estimators and study their performance when applied to a ranking problem.
△ Less
Submitted 28 February, 2023; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Estimating the inverse trace using random forests on graphs
Authors:
Simon Barthelmé,
Nicolas Tremblay,
Alexandre Gaudillière,
Luca Avena,
Pierre-Olivier Amblard
Abstract:
Some data analysis problems require the computation of (regularised) inverse traces, i.e. quantities of the form $\Tr (q \bI + \bL)^{-1}$. For large matrices, direct methods are unfeasible and one must resort to approximations, for example using a conjugate gradient solver combined with Girard's trace estimator (also known as Hutchinson's trace estimator). Here we describe an unbiased estimator of…
▽ More
Some data analysis problems require the computation of (regularised) inverse traces, i.e. quantities of the form $\Tr (q \bI + \bL)^{-1}$. For large matrices, direct methods are unfeasible and one must resort to approximations, for example using a conjugate gradient solver combined with Girard's trace estimator (also known as Hutchinson's trace estimator). Here we describe an unbiased estimator of the regularized inverse trace, based on Wilson's algorithm, an algorithm that was initially designed to draw uniform spanning trees in graphs. Our method is fast, easy to implement, and scales to very large matrices. Its main drawback is that it is limited to diagonally dominant matrices $\bL$.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Determinantal Point Processes for Coresets
Authors:
Nicolas Tremblay,
Simon Barthelmé,
Pierre-Olivier Amblard
Abstract:
When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them "coresets" are especially appealing. A coreset is a (small) weighted sample of the original data that comes with the following guarantee: a cost function can be evaluated on the smaller set instead of the larger…
▽ More
When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them "coresets" are especially appealing. A coreset is a (small) weighted sample of the original data that comes with the following guarantee: a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution (based on the so-called "sensitivity" metric), iid random sampling has turned to be one of the most successful methods for building coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to prove general coreset theorems. We apply our results to both the k-means and the linear regression problems, and give extensive empirical evidence that the small additional computational cost of DPP sampling comes with superior performance over its iid counterpart. Of independent interest, we also provide analytical formulas for the sensitivity in the linear regression and 1-means cases.
△ Less
Submitted 6 January, 2020; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes
Authors:
Simon Barthelmé,
Pierre-Olivier Amblard,
Nicolas Tremblay
Abstract:
Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diver…
▽ More
Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diversity. DPPs come in two variants: fixed-size and varying-size. A sample from a varying-size DPP is a subset of random cardinality, while in fixed-size "$k$-DPPs" the cardinality is fixed. The latter makes more sense in many applications, but unfortunately their computational properties are less attractive, since, among other things, inclusion probabilities are harder to compute. In this work we show that as the size of the ground set grows, $k$-DPPs and DPPs become equivalent, meaning that their inclusion probabilities converge. As a by-product, we obtain saddlepoint formulas for inclusion probabilities in $k$-DPPs. These turn out to be extremely accurate, and suffer less from numerical difficulties than exact methods do. Our results also suggest that $k$-DPPs and DPPs also have equivalent maximum likelihood estimators. Finally, we obtain results on asymptotic approximations of elementary symmetric polynomials which may be of independent interest.
△ Less
Submitted 21 August, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Optimized Algorithms to Sample Determinantal Point Processes
Authors:
Nicolas Tremblay,
Simon Barthelme,
Pierre-Olivier Amblard
Abstract:
In this technical report, we discuss several sampling algorithms for Determinantal Point Processes (DPP). DPPs have recently gained a broad interest in the machine learning and statistics literature as random point processes with negative correlation, i.e., ones that can generate a "diverse" sample from a set of items. They are parametrized by a matrix $\mathbf{L}$, called $L$-ensemble, that encod…
▽ More
In this technical report, we discuss several sampling algorithms for Determinantal Point Processes (DPP). DPPs have recently gained a broad interest in the machine learning and statistics literature as random point processes with negative correlation, i.e., ones that can generate a "diverse" sample from a set of items. They are parametrized by a matrix $\mathbf{L}$, called $L$-ensemble, that encodes the correlations between items. The standard sampling algorithm is separated in three phases: 1/~eigendecomposition of $\mathbf{L}$, 2/~an eigenvector sampling phase where $\mathbf{L}$'s eigenvectors are sampled independently via a Bernoulli variable parametrized by their associated eigenvalue, 3/~a Gram-Schmidt-type orthogonalisation procedure of the sampled eigenvectors.
In a naive implementation, the computational cost of the third step is on average $\mathcal{O}(Nμ^3)$ where $μ$ is the average number of samples of the DPP. We give an algorithm which runs in $\mathcal{O}(Nμ^2)$ and is extremely simple to implement. If memory is a constraint, we also describe a dual variant with reduced memory costs. In addition, we discuss implementation details often missing in the literature.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
Graph sampling with determinantal processes
Authors:
Nicolas Tremblay,
Pierre-Olivier Amblard,
Simon Barthelmé
Abstract:
We present a new random sampling strategy for k-bandlimited signals defined on graphs, based on determinantal point processes (DPP). For small graphs, ie, in cases where the spectrum of the graph is accessible, we exhibit a DPP sampling scheme that enables perfect recovery of bandlimited signals. For large graphs, ie, in cases where the graph's spectrum is not accessible, we investigate, both theo…
▽ More
We present a new random sampling strategy for k-bandlimited signals defined on graphs, based on determinantal point processes (DPP). For small graphs, ie, in cases where the spectrum of the graph is accessible, we exhibit a DPP sampling scheme that enables perfect recovery of bandlimited signals. For large graphs, ie, in cases where the graph's spectrum is not accessible, we investigate, both theoretically and empirically, a sub-optimal but much faster DPP based on loop-erased random walks on the graph. Preliminary experiments show promising results especially in cases where the number of measurements should stay as small as possible and for graphs that have a strong community structure. Our sampling scheme is efficient and can be applied to graphs with up to $10^6$ nodes.
△ Less
Submitted 5 March, 2017;
originally announced March 2017.
-
Bounding errors of Expectation-Propagation
Authors:
Guillaume P Dehaene,
Simon Barthelmé
Abstract:
Expectation Propagation is a very popular algorithm for variational inference, but comes with few theoretical guarantees. In this article, we prove that the approximation errors made by EP can be bounded. Our bounds have an asymptotic interpretation in the number $n$ of datapoints, which allows us to study EP's convergence with respect to the true posterior. In particular, we show that EP converge…
▽ More
Expectation Propagation is a very popular algorithm for variational inference, but comes with few theoretical guarantees. In this article, we prove that the approximation errors made by EP can be bounded. Our bounds have an asymptotic interpretation in the number $n$ of datapoints, which allows us to study EP's convergence with respect to the true posterior. In particular, we show that EP converges at a rate of $\mathcal{0}(n^{-2})$ for the mean, up to an order of magnitude faster than the traditional Gaussian approximation at the mode. We also give similar asymptotic expansions for moments of order 2 to 4, as well as excess Kullback-Leibler cost (defined as the additional KL cost incurred by using EP rather than the ideal Gaussian approximation). All these expansions highlight the superior convergence properties of EP. Our approach for deriving those results is likely applicable to many similar approximate inference methods. In addition, we introduce bounds on the moments of log-concave distributions that may be of independent interest.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference
Authors:
Simon Barthelmé,
Nicolas Chopin,
Vincent Cottet
Abstract:
ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standa…
▽ More
ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standard ABC and (b) it makes it possible to use local summary statistics (i.e. summary statistics that depend only on the data-points that correspond to a single factor), rather than global summary statistics (that depend on the complete dataset). This greatly alleviates the bias introduced by summary statistics, and even removes it entirely in situations where local summary statistics are simply the identity function.
We focus on EP (Expectation-Propagation), a convenient and powerful way to combine n local approximations into a global approximation. Compared to the EP- ABC approach of Barthelmé and Chopin (2014), we present two variations, one based on the parallel EP algorithm of Cseke and Heskes (2011), which has the advantage of being implementable on a parallel architecture, and one version which bridges the gap between standard EP and parallel EP. We illustrate our approach with an expensive application of ABC, namely inference on spatial extremes.
△ Less
Submitted 1 December, 2015;
originally announced December 2015.
-
Expectation Propagation in the large-data limit
Authors:
Guillaume Dehaene,
Simon Barthelmé
Abstract:
Expectation Propagation (Minka, 2001) is a widely successful algorithm for variational inference. EP is an iterative algorithm used to approximate complicated distributions, typically to find a Gaussian approximation of posterior distributions. In many applications of this type, EP performs extremely well. Surprisingly, despite its widespread use, there are very few theoretical guarantees on Gauss…
▽ More
Expectation Propagation (Minka, 2001) is a widely successful algorithm for variational inference. EP is an iterative algorithm used to approximate complicated distributions, typically to find a Gaussian approximation of posterior distributions. In many applications of this type, EP performs extremely well. Surprisingly, despite its widespread use, there are very few theoretical guarantees on Gaussian EP, and it is quite poorly understood.
In order to analyze EP, we first introduce a variant of EP: averaged-EP (aEP), which operates on a smaller parameter space. We then consider aEP and EP in the limit of infinite data, where the overall contribution of each likelihood term is small and where posteriors are almost Gaussian. In this limit, we prove that the iterations of both aEP and EP are simple: they behave like iterations of Newton's algorithm for finding the mode of a function. We use this limit behavior to prove that EP is asymptotically exact, and to obtain other insights into the dynamic behavior of EP: for example, that it may diverge under poor initialization exactly like Newton's method. EP is a simple algorithm to state, but a difficult one to study. Our results should facilitate further research into the theoretical properties of this important method.
△ Less
Submitted 31 March, 2016; v1 submitted 27 March, 2015;
originally announced March 2015.
-
The Poisson transform for unnormalised statistical models
Authors:
Simon Barthelmé,
Nicolas Chopin
Abstract:
Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space $Ω$ can be mapped onto an equivalent problem of estimating the intensity of a Poisson po…
▽ More
Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space $Ω$ can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on $Ω$. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non-IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parameteric inference in non-IID models can be turned into a semi-parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyvärinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.
△ Less
Submitted 27 November, 2014; v1 submitted 11 June, 2014;
originally announced June 2014.
-
Fast matrix computations for functional additive models
Authors:
Simon Barthelme
Abstract:
It is common in functional data analysis to look at a set of related functions: a set of learning curves, a set of brain signals, a set of spatial maps, etc. One way to express relatedness is through an additive model, whereby each individual function $g_{i}\left(x\right)$ is assumed to be a variation around some shared mean $f(x)$. Gaussian processes provide an elegant way of constructing such ad…
▽ More
It is common in functional data analysis to look at a set of related functions: a set of learning curves, a set of brain signals, a set of spatial maps, etc. One way to express relatedness is through an additive model, whereby each individual function $g_{i}\left(x\right)$ is assumed to be a variation around some shared mean $f(x)$. Gaussian processes provide an elegant way of constructing such additive models, but suffer from computational difficulties arising from the matrix operations that need to be performed. Recently Heersink & Furrer have shown that functional additive model give rise to covariance matrices that have a specific form they called quasi-Kronecker (QK), whose inverses are relatively tractable. We show that under additional assumptions the two-level additive model leads to a class of matrices we call restricted quasi-Kronecker, which enjoy many interesting properties. In particular, we formulate matrix factorisations whose complexity scales only linearly in the number of functions in latent field, an enormous improvement over the cubic scaling of naïve approaches. We describe how to leverage the properties of rQK matrices for inference in Latent Gaussian Models.
△ Less
Submitted 20 February, 2014;
originally announced February 2014.
-
Visualizing the Effects of a Changing Distance on Data Using Continuous Embeddings
Authors:
Gina Gruenhage,
Manfred Opper,
Simon Barthelme
Abstract:
Most Machine Learning (ML) methods, from clustering to classification, rely on a distance function to describe relationships between datapoints. For complex datasets it is hard to avoid making some arbitrary choices when defining a distance function. To compare images, one must choose a spatial scale, for signals, a temporal scale. The right scale is hard to pin down and it is preferable when resu…
▽ More
Most Machine Learning (ML) methods, from clustering to classification, rely on a distance function to describe relationships between datapoints. For complex datasets it is hard to avoid making some arbitrary choices when defining a distance function. To compare images, one must choose a spatial scale, for signals, a temporal scale. The right scale is hard to pin down and it is preferable when results do not depend too tightly on the exact value one picked. Topological data analysis seeks to address this issue by focusing on the notion of neighbourhood instead of distance. It is shown that in some cases a simpler solution is available. It can be checked how strongly distance relationships depend on a hyperparameter using dimensionality reduction. A variant of dynamical multi-dimensional scaling (MDS) is formulated, which embeds datapoints as curves. The resulting algorithm is based on the Concave-Convex Procedure (CCCP) and provides a simple and efficient way of visualizing changes and invariances in distance patterns as a hyperparameter is varied. A variant to analyze the dependence on multiple hyperparameters is also presented. A cMDS algorithm that is straightforward to implement, use and extend is provided. To illustrate the possibilities of cMDS, cMDS is applied to several real-world data sets.
△ Less
Submitted 1 July, 2016; v1 submitted 8 November, 2013;
originally announced November 2013.
-
Modelling fixation locations using spatial point processes
Authors:
Simon Barthelmé,
Hans Trukenbrod,
Ralf Engbert,
Felix Wichmann
Abstract:
Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate, and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space: this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue tha…
▽ More
Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate, and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space: this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye movement data, helping turn qualitative questions into quantitative ones.
We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.
△ Less
Submitted 22 May, 2013; v1 submitted 10 July, 2012;
originally announced July 2012.
-
Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"
Authors:
Christophe Andrieu,
Simon Barthelme,
Nicolas Chopin,
Julien Cornebise,
Arnaud Doucet,
Mark Girolami,
Ioannis Kosmidis,
Ajay Jasra,
Anthony Lee,
Jean-Michel Marin,
Pierre Pudlo,
Christian P. Robert,
Mohammed Sedki.,
Sumeetpal S. Singh
Abstract:
This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors.
This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors.
△ Less
Submitted 5 January, 2012;
originally announced January 2012.
-
Expectation-Propagation for Likelihood-Free Inference
Authors:
Simon Barthelmé,
Nicolas Chopin
Abstract:
Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested,…
▽ More
Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested, ABC inference is still far from routine. ABC is often excruciatingly slow due to very low acceptance rates. In addition, ABC requires introducing a vector of "summary statistics", the choice of which is relatively arbitrary, and often require some trial and error, making the whole process quite laborious for the user.
We introduce in this work the EP-ABC algorithm, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as Expectation Propagation (Minka, 2001). The main advantage of EP-ABC is that it is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error which is typically negligible. A second advantage of EP-ABC is that it replaces the usual global ABC constraint on the vector of summary statistics computed on the whole dataset, by n local constraints of the form that apply separately to each data-point. As a consequence, it is often possible to do away with summary statistics entirely. In that case, EP-ABC approximates directly the evidence (marginal likelihood) of the model.
Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques.
△ Less
Submitted 18 July, 2012; v1 submitted 29 July, 2011;
originally announced July 2011.
-
Discussions on "Riemann manifold Langevin and Hamiltonian Monte Carlo methods"
Authors:
Simon Barthelme,
Magali Beffy,
Nicolas Chopin,
Arnaud Doucet,
Pierre Jacob,
Adam M. Johansen,
Jean-Michel Marin,
Christian P. Robert
Abstract:
This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.
This is a collection of discussions of `Riemann manifold Langevin and Hamiltonian Monte Carlo methods" by Girolami and Calderhead, to appear in the Journal of the Royal Statistical Society, Series B.
△ Less
Submitted 3 November, 2010;
originally announced November 2010.
-
A flexible Bayesian method for adaptive measurement in psychophysics
Authors:
Simon Barthelmé,
Pascal Mamassian
Abstract:
In psychophysical experiments time and the limited goodwill of participants is usually a major constraint. This has been the main motivation behind the early development of adaptive methods for the measurements of psychometric thresholds. More recently methods have been developed to measure whole psychometric functions in an adaptive way. Here we describe a Bayesian method to measure adaptively…
▽ More
In psychophysical experiments time and the limited goodwill of participants is usually a major constraint. This has been the main motivation behind the early development of adaptive methods for the measurements of psychometric thresholds. More recently methods have been developed to measure whole psychometric functions in an adaptive way. Here we describe a Bayesian method to measure adaptively any aspect of a psychophysical function, taking inspiration from Kontsevich and Tyler's optimal Bayesian measurement method. Our method is implemented in a complete and easy-to-use MATLAB package.
△ Less
Submitted 2 September, 2008;
originally announced September 2008.