-
Deterministic Fokker-Planck Transport -- With Applications to Sampling, Variational Inference, Kernel Mean Embeddings & Sequential Monte Carlo
Authors:
Ilja Klebanov
Abstract:
The Fokker-Planck equation can be reformulated as a continuity equation, which naturally suggests using the associated velocity field in particle flow methods. While the resulting probability flow ODE offers appealing properties - such as defining a gradient flow of the Kullback-Leibler divergence between the current and target densities with respect to the 2-Wasserstein distance - it relies on ev…
▽ More
The Fokker-Planck equation can be reformulated as a continuity equation, which naturally suggests using the associated velocity field in particle flow methods. While the resulting probability flow ODE offers appealing properties - such as defining a gradient flow of the Kullback-Leibler divergence between the current and target densities with respect to the 2-Wasserstein distance - it relies on evaluating the current probability density, which is intractable in most practical applications. By closely examining the drawbacks of approximating this density via kernel density estimation, we uncover opportunities to turn these limitations into advantages in contexts such as variational inference, kernel mean embeddings, and sequential Monte Carlo.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions
Authors:
Ilja Klebanov,
T. J. Sullivan
Abstract:
Integration against, and hence sampling from, high-dimensional probability distributions is of essential importance in many application areas and has been an active research area for decades. One approach that has drawn increasing attention in recent years has been the generation of samples from a target distribution $\mathbb{P}_{\mathrm{tar}}$ using transport maps: if…
▽ More
Integration against, and hence sampling from, high-dimensional probability distributions is of essential importance in many application areas and has been an active research area for decades. One approach that has drawn increasing attention in recent years has been the generation of samples from a target distribution $\mathbb{P}_{\mathrm{tar}}$ using transport maps: if $\mathbb{P}_{\mathrm{tar}} = T_\# \mathbb{P}_{\mathrm{ref}}$ is the pushforward of an easily-sampled probability distribution $\mathbb{P}_{\mathrm{ref}}$ under the transport map $T$, then the application of $T$ to $\mathbb{P}_{\mathrm{ref}}$-distributed samples yields $\mathbb{P}_{\mathrm{tar}}$-distributed samples. This paper proposes the application of transport maps not just to random samples, but also to quasi-Monte Carlo points, higher-order nets, and sparse grids in order for the transformed samples to inherit the original convergence rates that are often better than $N^{-1/2}$, $N$ being the number of samples/quadrature nodes. Our main result is the derivation of an explicit transport map for the case that $\mathbb{P}_{\mathrm{tar}}$ is a mixture of simple distributions, e.g.\ a Gaussian mixture, in which case application of the transport map $T$ requires the solution of an \emph{explicit} ODE with \emph{closed-form} right-hand side. Mixture distributions are of particular applicability and interest since many methods proceed by first approximating $\mathbb{P}_{\mathrm{tar}}$ by a mixture and then sampling from that mixture (often using importance reweighting). Hence, this paper allows for the sampling step to provide a better convergence rate than $N^{-1/2}$ for all such methods.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
The linear conditional expectation in Hilbert space
Authors:
Ilja Klebanov,
Björn Sprungk,
T. J. Sullivan
Abstract:
The linear conditional expectation (LCE) provides a best linear (or rather, affine) estimate of the conditional expectation and hence plays an important rôle in approximate Bayesian inference, especially the Bayes linear approach. This article establishes the analytical properties of the LCE in an infinite-dimensional Hilbert space context. In addition, working in the space of affine Hilbert--Schm…
▽ More
The linear conditional expectation (LCE) provides a best linear (or rather, affine) estimate of the conditional expectation and hence plays an important rôle in approximate Bayesian inference, especially the Bayes linear approach. This article establishes the analytical properties of the LCE in an infinite-dimensional Hilbert space context. In addition, working in the space of affine Hilbert--Schmidt operators, we establish a regularisation procedure for this LCE. As an important application, we obtain a simple alternative derivation and intuitive justification of the conditional mean embedding formula, a concept widely used in machine learning to perform the conditioning of random variables by embedding them into reproducing kernel Hilbert spaces.
△ Less
Submitted 7 December, 2020; v1 submitted 27 August, 2020;
originally announced August 2020.
-
A Rigorous Theory of Conditional Mean Embeddings
Authors:
Ilja Klebanov,
Ingmar Schuster,
T. J. Sullivan
Abstract:
Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both cen…
▽ More
Conditional mean embeddings (CMEs) have proven themselves to be a powerful tool in many machine learning applications. They allow the efficient conditioning of probability distributions within the corresponding reproducing kernel Hilbert spaces (RKHSs) by providing a linear-algebraic relation for the kernel mean embeddings of the respective joint and conditional probability distributions. Both centred and uncentred covariance operators have been used to define CMEs in the existing literature. In this paper, we develop a mathematically rigorous theory for both variants, discuss the merits and problems of each, and significantly weaken the conditions for applicability of CMEs. In the course of this, we demonstrate a beautiful connection to Gaussian conditioning in Hilbert spaces.
△ Less
Submitted 14 April, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Markov Chain Importance Sampling -- a highly efficient estimator for MCMC
Authors:
Ingmar Schuster,
Ilja Klebanov
Abstract:
Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it pr…
▽ More
Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.
△ Less
Submitted 6 August, 2020; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Axiomatic Approach to Variable Kernel Density Estimation
Authors:
Ilja Klebanov
Abstract:
Variable kernel density estimation allows the approximation of a probability density by the mean of differently stretched and rotated kernels centered at given sampling points $y_n\in\mathbb{R}^d,\ n=1,\dots,N$. Up to now, the choice of the corresponding bandwidth matrices $h_n$ has relied mainly on asymptotic arguments, like the minimization of the asymptotic mean integrated squared error (AMISE)…
▽ More
Variable kernel density estimation allows the approximation of a probability density by the mean of differently stretched and rotated kernels centered at given sampling points $y_n\in\mathbb{R}^d,\ n=1,\dots,N$. Up to now, the choice of the corresponding bandwidth matrices $h_n$ has relied mainly on asymptotic arguments, like the minimization of the asymptotic mean integrated squared error (AMISE), which work well for large numbers of sampling points. However, in practice, one is often confronted with small to moderately sized sample sets far below the asymptotic regime, which highly restricts the usability of such methods. As an alternative to this asymptotic reasoning we suggest an axiomatic approach which guarantees invariance of the density estimate under linear transformations of the original density (and the sampling points) as well as under splitting of the density into several `well-separated' parts. In order to still ensure proper asymptotic behavior of the estimate, we \emph{postulate} the typical dependence $h_n\propto N^{-1/(d+4)}$. Further, we derive a new bandwidths selection rule which satisfies these axioms and performs considerably better than conventional ones in an artificially intricate two-dimensional example as well as in a real life example.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Empirical Bayes Methods for Prior Estimation in Systems Medicine
Authors:
Ilja Klebanov,
Alexander Sikorski,
Christof Schütte,
Susanna Röblitz
Abstract:
One of the main goals of mathematical modeling in systems medicine related to medical applications is to obtain patient-specific parameterizations and model predictions. In clinical practice, however, the number of available measurements for single patients is usually limited due to time and cost restrictions. This hampers the process of making patient-specific predictions about the outcome of a t…
▽ More
One of the main goals of mathematical modeling in systems medicine related to medical applications is to obtain patient-specific parameterizations and model predictions. In clinical practice, however, the number of available measurements for single patients is usually limited due to time and cost restrictions. This hampers the process of making patient-specific predictions about the outcome of a treatment. On the other hand, data are often available for many patients, in particular if extensive clinical studies have been performed. Therefore, before applying Bayes' rule \emph{separately} to the data of each patient (which is typically performed using a non-informative prior), it is meaningful to use empirical Bayes methods in order to construct an informative prior from all available data. We compare the performance of four priors -- a non-informative prior and priors chosen by nonparametric maximum likelihood estimation (NPMLE), by maximum penalized likelihood estimation (MPLE) and by doubly-smoothed maximum likelihood estimation (DS-MLE) -- by applying them to a low-dimensional parameter estimation problem in a toy model as well as to a high-dimensional ODE model of the human menstrual cycle, which represents a typical example from systems biology modeling.
△ Less
Submitted 14 December, 2016; v1 submitted 5 December, 2016;
originally announced December 2016.
-
Objective Priors in the Empirical Bayes Framework
Authors:
Ilja Klebanov,
Alexander Sikorski,
Christof Schütte,
Susanna Röblitz
Abstract:
When dealing with Bayesian inference the choice of the prior often remains a debatable question. Empirical Bayes methods offer a data-driven solution to this problem by estimating the prior itself from an ensemble of data. In the nonparametric case, the maximum likelihood estimate is known to overfit the data, an issue that is commonly tackled by regularization. However, the majority of regulariza…
▽ More
When dealing with Bayesian inference the choice of the prior often remains a debatable question. Empirical Bayes methods offer a data-driven solution to this problem by estimating the prior itself from an ensemble of data. In the nonparametric case, the maximum likelihood estimate is known to overfit the data, an issue that is commonly tackled by regularization. However, the majority of regularizations are ad hoc choices which lack invariance under reparametrization of the model and result in inconsistent estimates for equivalent models. We introduce a non-parametric, transformation invariant estimator for the prior distribution. Being defined in terms of the missing information similar to the reference prior, it can be seen as an extension of the latter to the data-driven setting. This implies a natural interpretation as a trade-off between choosing the least informative prior and incorporating the information provided by the data, a symbiosis between the objective and empirical Bayes methodologies.
△ Less
Submitted 11 May, 2020; v1 submitted 30 November, 2016;
originally announced December 2016.