-
On the Asymptotics of Importance Weighted Variational Inference
Authors:
Badr-Eddine Cherief-Abdellatif,
Randal Douc,
Arnaud Doucet,
Hugo Marival
Abstract:
For complex latent variable models, the likelihood function is not available in closed form. In this context, a popular method to perform parameter estimation is Importance Weighted Variational Inference. It essentially maximizes the expectation of the logarithm of an importance sampling estimate of the likelihood with respect to both the latent variable model parameters and the importance distrib…
▽ More
For complex latent variable models, the likelihood function is not available in closed form. In this context, a popular method to perform parameter estimation is Importance Weighted Variational Inference. It essentially maximizes the expectation of the logarithm of an importance sampling estimate of the likelihood with respect to both the latent variable model parameters and the importance distribution parameters, the expectation being itself with respect to the importance samples. Despite its great empirical success in machine learning, a theoretical analysis of the limit properties of the resulting estimates is still lacking. We fill this gap by establishing consistency when both the Monte Carlo and the observed data sample sizes go to infinity simultaneously. We also establish asymptotic normality and efficiency under additional conditions relating the rate of growth between the Monte Carlo and the observed data samples sizes. We distinguish several regimes related to the smoothness of the importance ratio.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
On PAC-Bayesian reconstruction guarantees for VAEs
Authors:
Badr-Eddine Chérief-Abdellatif,
Yuyang Shi,
Arnaud Doucet,
Benjamin Guedj
Abstract:
Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE's reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theor…
▽ More
Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE's reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theoretical reconstruction error, and provide insights on the regularisation effect of VAE objectives. We illustrate our theoretical results with supporting experiments on classical benchmark datasets.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Estimation of copulas via Maximum Mean Discrepancy
Authors:
Pierre Alquier,
Badr-Eddine Chérief-Abdellatif,
Alexis Derumigny,
Jean-David Fermanian
Abstract:
This paper deals with robust inference for parametric copula models. Estimation using Canonical Maximum Likelihood might be unstable, especially in the presence of outliers. We propose to use a procedure based on the Maximum Mean Discrepancy (MMD) principle. We derive non-asymptotic oracle inequalities, consistency and asymptotic normality of this new estimator. In particular, the oracle inequalit…
▽ More
This paper deals with robust inference for parametric copula models. Estimation using Canonical Maximum Likelihood might be unstable, especially in the presence of outliers. We propose to use a procedure based on the Maximum Mean Discrepancy (MMD) principle. We derive non-asymptotic oracle inequalities, consistency and asymptotic normality of this new estimator. In particular, the oracle inequality holds without any assumption on the copula family, and can be applied in the presence of outliers or under misspecification. Moreover, in our MMD framework, the statistical inference of copula models for which there exists no density with respect to the Lebesgue measure on $[0,1]^d$, as the Marshall-Olkin copula, becomes feasible. A simulation study shows the robustness of our new procedures, especially compared to pseudo-maximum likelihood estimation. An R package implementing the MMD estimator for copula models is available.
△ Less
Submitted 14 January, 2022; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Finite sample properties of parametric MMD estimation: robustness to misspecification and dependence
Authors:
Badr-Eddine Chérief-Abdellatif,
Pierre Alquier
Abstract:
Many works in statistics aim at designing a universal estimation procedure, that is, an estimator that would converge to the best approximation of the (unknown) data generating distribution in a model, without any assumption on this distribution. This question is of major interest, in particular because the universality property leads to the robustness of the estimator. In this paper, we tackle th…
▽ More
Many works in statistics aim at designing a universal estimation procedure, that is, an estimator that would converge to the best approximation of the (unknown) data generating distribution in a model, without any assumption on this distribution. This question is of major interest, in particular because the universality property leads to the robustness of the estimator. In this paper, we tackle the problem of universal estimation using a minimum distance estimator presented in Briol et al. (2019) based on the Maximum Mean Discrepancy. We show that the estimator is robust to both dependence and to the presence of outliers in the dataset. Finally, we provide a theoretical study of the stochastic gradient descent algorithm used to compute the estimator, and we support our findings with numerical simulations.
** The proof of Proposition 4.4 in the published version contains a mistake. The mistake is fixed here (and the bound is actually improved by a factor 2). **
△ Less
Submitted 13 February, 2025; v1 submitted 11 December, 2019;
originally announced December 2019.
-
MMD-Bayes: Robust Bayesian Estimation via Maximum Mean Discrepancy
Authors:
Badr-Eddine Chérief-Abdellatif,
Pierre Alquier
Abstract:
In some misspecified settings, the posterior distribution in Bayesian statistics may lead to inconsistent estimates. To fix this issue, it has been suggested to replace the likelihood by a pseudo-likelihood, that is the exponential of a loss function enjoying suitable robustness properties. In this paper, we build a pseudo-likelihood based on the Maximum Mean Discrepancy, defined via an embedding…
▽ More
In some misspecified settings, the posterior distribution in Bayesian statistics may lead to inconsistent estimates. To fix this issue, it has been suggested to replace the likelihood by a pseudo-likelihood, that is the exponential of a loss function enjoying suitable robustness properties. In this paper, we build a pseudo-likelihood based on the Maximum Mean Discrepancy, defined via an embedding of probability distributions into a reproducing kernel Hilbert space. We show that this MMD-Bayes posterior is consistent and robust to model misspecification. As the posterior obtained in this way might be intractable, we also prove that reasonable variational approximations of this posterior enjoy the same properties. We provide details on a stochastic gradient algorithm to compute these variational approximations. Numerical simulations indeed suggest that our estimator is more robust to misspecification than the ones based on the likelihood.
△ Less
Submitted 11 December, 2019; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Convergence Rates of Variational Inference in Sparse Deep Learning
Authors:
Badr-Eddine Chérief-Abdellatif
Abstract:
Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference…
▽ More
Variational inference is becoming more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Meanwhile, a few recent works have provided theoretical justification and new insights on deep neural networks for estimating smooth functions in usual settings such as nonparametric regression. In this paper, we show that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference. In particular, we highlight the connection between estimation and approximation theories via the classical bias-variance trade-off and show that it leads to near-minimax rates of convergence for Hölder smooth functions. Additionally, we show that the model selection framework over the neural network architecture via ELBO maximization does not overfit and adaptively achieves the optimal rate of convergence.
△ Less
Submitted 5 September, 2019; v1 submitted 9 August, 2019;
originally announced August 2019.
-
A Generalization Bound for Online Variational Inference
Authors:
Badr-Eddine Chérief-Abdellatif,
Pierre Alquier,
Mohammad Emtiyaz Khan
Abstract:
Bayesian inference provides an attractive online-learning framework to analyze sequential data, and offers generalization guarantees which hold even with model mismatch and adversaries. Unfortunately, exact Bayesian inference is rarely feasible in practice and approximation methods are usually employed, but do such methods preserve the generalization properties of Bayesian inference ? In this pape…
▽ More
Bayesian inference provides an attractive online-learning framework to analyze sequential data, and offers generalization guarantees which hold even with model mismatch and adversaries. Unfortunately, exact Bayesian inference is rarely feasible in practice and approximation methods are usually employed, but do such methods preserve the generalization properties of Bayesian inference ? In this paper, we show that this is indeed the case for some variational inference (VI) algorithms. We consider a few existing online, tempered VI algorithms, as well as a new algorithm, and derive their generalization bounds. Our theoretical result relies on the convexity of the variational objective, but we argue that the result should hold more generally and present empirical evidence in support of this. Our work in this paper presents theoretical justifications in favor of online algorithms relying on approximate Bayesian methods.
△ Less
Submitted 10 December, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Consistency of ELBO maximization for model selection
Authors:
Badr-Eddine Chérief-Abdellatif
Abstract:
The Evidence Lower Bound (ELBO) is a quantity that plays a key role in variational inference. It can also be used as a criterion in model selection. However, though extremely popular in practice in the variational Bayes community, there has never been a general theoretic justification for selecting based on the ELBO. In this paper, we show that the ELBO maximization strategy has strong theoretical…
▽ More
The Evidence Lower Bound (ELBO) is a quantity that plays a key role in variational inference. It can also be used as a criterion in model selection. However, though extremely popular in practice in the variational Bayes community, there has never been a general theoretic justification for selecting based on the ELBO. In this paper, we show that the ELBO maximization strategy has strong theoretical guarantees, and is robust to model misspecification while most works rely on the assumption that one model is correctly specified. We illustrate our theoretical results by an application to the selection of the number of principal components in probabilistic PCA.
△ Less
Submitted 8 April, 2019; v1 submitted 28 October, 2018;
originally announced October 2018.
-
Consistency of Variational Bayes Inference for Estimation and Model Selection in Mixtures
Authors:
Badr-Eddine Chérief-Abdellatif,
Pierre Alquier
Abstract:
Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-…
▽ More
Mixture models are widely used in Bayesian statistics and machine learning, in particular in computational biology, natural language processing and many other fields. Variational inference, a technique for approximating intractable posteriors thanks to optimization algorithms, is extremely popular in practice when dealing with complex models such as mixtures. The contribution of this paper is two-fold. First, we study the concentration of variational approximations of posteriors, which is still an open problem for general mixtures, and we derive consistency and rates of convergence. We also tackle the problem of model selection for the number of components: we study the approach already used in practice, which consists in maximizing a numerical criterion (the Evidence Lower Bound). We prove that this strategy indeed leads to strong oracle inequalities. We illustrate our theoretical results by applications to Gaussian and multinomial mixtures.
△ Less
Submitted 12 August, 2018; v1 submitted 14 May, 2018;
originally announced May 2018.