-
Conditioning Diffusions Using Malliavin Calculus
Authors:
Jakiw Pidstrigach,
Elizabeth Baker,
Carles Domingo-Enrich,
George Deligiannidis,
Nikolas Nüsken
Abstract:
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an inf…
▽ More
In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and centred around a generalisation of the Tweedie score formula to nonlinear stochastic differential equations, that enables the development of methods robust to such singularities. This allows our approach to handle a broad range of applications, like diffusion bridges, or adding conditional controls to an already trained diffusion model. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques. As a byproduct, we also introduce a novel score matching objective. Our loss functions are formulated such that they could readily be extended to manifold-valued and infinite dimensional diffusions.
△ Less
Submitted 6 June, 2025; v1 submitted 4 April, 2025;
originally announced April 2025.
-
Stein transport for Bayesian inference
Authors:
Nikolas Nüsken
Abstract:
We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport m…
▽ More
We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.
△ Less
Submitted 28 November, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Skew-symmetric schemes for stochastic differential equations with non-Lipschitz drift: an unadjusted Barker algorithm
Authors:
Yuga Iguchi,
Samuel Livingstone,
Nikolas Nüsken,
Giorgos Vasdekis,
Rui-Yang Zhang
Abstract:
We propose a new simple and explicit numerical scheme for time-homogeneous stochastic differential equations. The scheme is based on sampling increments at each time step from a skew-symmetric probability distribution, with the level of skewness determined by the drift and volatility of the underlying process. We show that as the step-size decreases the scheme converges weakly to the diffusion of…
▽ More
We propose a new simple and explicit numerical scheme for time-homogeneous stochastic differential equations. The scheme is based on sampling increments at each time step from a skew-symmetric probability distribution, with the level of skewness determined by the drift and volatility of the underlying process. We show that as the step-size decreases the scheme converges weakly to the diffusion of interest. We then consider the problem of simulating from the limiting distribution of an ergodic diffusion process using the numerical scheme with a fixed step-size. We establish conditions under which the numerical scheme converges to equilibrium at a geometric rate, and quantify the bias between the equilibrium distributions of the scheme and of the true diffusion process. Notably, our results do not require a global Lipschitz assumption on the drift, in contrast to those required for the Euler--Maruyama scheme for long-time simulation at fixed step-sizes. Our weak convergence result relies on an extension of the theory of Milstein \& Tretyakov to stochastic differential equations with non-Lipschitz drift, which could also be of independent interest. We support our theoretical results with numerical simulations.
△ Less
Submitted 7 July, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Measure transport with kernel mean embeddings
Authors:
L. Wang,
N. Nüsken
Abstract:
Kalman filters constitute a scalable and robust methodology for approximate Bayesian inference, matching first and second order moments of the target posterior. To improve the accuracy in nonlinear and non-Gaussian settings, we extend this principle to include more or different characteristics, based on kernel mean embeddings (KMEs) of probability measures into reproducing kernel Hilbert spaces. F…
▽ More
Kalman filters constitute a scalable and robust methodology for approximate Bayesian inference, matching first and second order moments of the target posterior. To improve the accuracy in nonlinear and non-Gaussian settings, we extend this principle to include more or different characteristics, based on kernel mean embeddings (KMEs) of probability measures into reproducing kernel Hilbert spaces. Focusing on the continuous-time setting, we develop a family of interacting particle systems (termed $\textit{KME-dynamics}$) that bridge between prior and posterior, and that include the Kalman-Bucy filter as a special case. KME-dynamics does not require the score of the target, but rather estimates the score implicitly and intrinsically, and we develop links to score-based generative modeling and importance reweighting. A variant of KME-dynamics has recently been derived from an optimal transport and Fisher-Rao gradient flow perspective by Maurais and Marzouk, and we expose further connections to (kernelised) diffusion maps, leading to a variational formulation of regression type. Finally, we conduct numerical experiments on toy examples and the Lorenz 63 and 96 models, comparing our results against the ensemble Kalman filter and the mapping particle filter (Pulido and van Leeuwen, 2019, J. Comput. Phys.). Our experiments show particular promise for a hybrid modification (called Kalman-adjusted KME-dynamics).
△ Less
Submitted 2 September, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Coherent set identification via direct low rank maximum likelihood estimation
Authors:
Robert Polzin,
Ilja Klebanov,
Nikolas Nüsken,
Péter Koltai
Abstract:
We analyze connections between two low rank modeling approaches from the last decade for treating dynamical data. The first one is the coherence problem (or coherent set approach), where groups of states are sought that evolve under the action of a stochastic transition matrix in a way maximally distinguishable from other groups. The second one is a low rank factorization approach for stochastic m…
▽ More
We analyze connections between two low rank modeling approaches from the last decade for treating dynamical data. The first one is the coherence problem (or coherent set approach), where groups of states are sought that evolve under the action of a stochastic transition matrix in a way maximally distinguishable from other groups. The second one is a low rank factorization approach for stochastic matrices, called Direct Bayesian Model Reduction (DBMR), which estimates the low rank factors directly from observed data. We show that DBMR results in a low rank model that is a projection of the full model, and exploit this insight to infer bounds on a quantitative measure of coherence within the reduced model. Both approaches can be formulated as optimization problems, and we also prove a bound between their respective objectives. On a broader scope, this work relates the two classical loss functions of nonnegative matrix factorization, namely the Frobenius norm and the generalized Kullback--Leibler divergence, and suggests new links between likelihood-based and projection-based estimation of probabilistic models.
△ Less
Submitted 1 October, 2024; v1 submitted 15 August, 2023;
originally announced August 2023.
-
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs
Authors:
Lorenz Richter,
Leon Sallandt,
Nikolas Nüsken
Abstract:
The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue…
▽ More
The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Interpolating between BSDEs and PINNs: deep learning for elliptic and parabolic boundary value problems
Authors:
Nikolas Nüsken,
Lorenz Richter
Abstract:
Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting…
▽ More
Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting on reformulations in terms of $\textit{backward stochastic differential equations}$ (BSDEs) and those aiming to minimize a regression-type $L^2$-error ($\textit{physics-informed neural networks}$, PINNs). In this paper, we review the literature and suggest a methodology based on the novel $\textit{diffusion loss}$ that interpolates between BSDEs and PINNs. Our contribution opens the door towards a unified understanding of numerical approaches for high-dimensional PDEs, as well as for implementations that combine the strengths of BSDEs and PINNs. The diffusion loss furthermore bears close similarities to $\textit{(least squares) temporal difference}$ objectives found in reinforcement learning. We also discuss eigenvalue problems and perform extensive numerical studies, including calculations of the ground state for nonlinear Schrödinger operators and committor functions relevant in molecular dynamics.
△ Less
Submitted 29 January, 2023; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Rough McKean-Vlasov dynamics for robust ensemble Kalman filtering
Authors:
Michele Coghi,
Torstein Nilssen,
Nikolas Nüsken,
Sebastian Reich
Abstract:
Motivated by the challenge of incorporating data into misspecified and multiscale dynamical models, we study a McKean-Vlasov equation that contains the data stream as a common driving rough path. This setting allows us to prove well-posedness as well as continuity with respect to the driver in an appropriate rough-path topology. The latter property is key in our subsequent development of a robust…
▽ More
Motivated by the challenge of incorporating data into misspecified and multiscale dynamical models, we study a McKean-Vlasov equation that contains the data stream as a common driving rough path. This setting allows us to prove well-posedness as well as continuity with respect to the driver in an appropriate rough-path topology. The latter property is key in our subsequent development of a robust data assimilation methodology: We establish propagation of chaos for the associated interacting particle system, which in turn is suggestive of a numerical scheme that can be viewed as an extension of the ensemble Kalman filter to a rough-path framework. Finally, we discuss a data-driven method based on subsampling to construct suitable rough path lifts and demonstrate the robustness of our scheme in a number of numerical experiments related to parameter estimation problems in multiscale contexts.
△ Less
Submitted 20 January, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Stein Variational Gradient Descent: many-particle and long-time asymptotics
Authors:
Nikolas Nüsken,
D. R. Michiel Renger
Abstract:
Stein variational gradient descent (SVGD) refers to a class of methods for Bayesian inference based on interacting particle systems. In this paper, we consider the originally proposed deterministic dynamics as well as a stochastic variant, each of which represent one of the two main paradigms in Bayesian computational statistics: variational inference and Markov chain Monte Carlo. As it turns out,…
▽ More
Stein variational gradient descent (SVGD) refers to a class of methods for Bayesian inference based on interacting particle systems. In this paper, we consider the originally proposed deterministic dynamics as well as a stochastic variant, each of which represent one of the two main paradigms in Bayesian computational statistics: variational inference and Markov chain Monte Carlo. As it turns out, these are tightly linked through a correspondence between gradient flow structures and large-deviation principles rooted in statistical physics. To expose this relationship, we develop the cotangent space construction for the Stein geometry, prove its basic properties, and determine the large-deviation functional governing the many-particle limit for the empirical measure. Moreover, we identify the Stein-Fisher information (or kernelised Stein discrepancy) as its leading order contribution in the long-time and many-particle regime in the sense of $Γ$-convergence, shedding some light on the finite-particle properties of SVGD. Finally, we establish a comparison principle between the Stein-Fisher information and RKHS-norms that might be of independent interest.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Solving high-dimensional parabolic PDEs using the tensor train format
Authors:
Lorenz Richter,
Leon Sallandt,
Nikolas Nüsken
Abstract:
High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulat…
▽ More
High-dimensional partial differential equations (PDEs) are ubiquitous in economics, science and engineering. However, their numerical treatment poses formidable challenges since traditional grid-based methods tend to be frustrated by the curse of dimensionality. In this paper, we argue that tensor trains provide an appealing approximation framework for parabolic PDEs: the combination of reformulations in terms of backward stochastic differential equations and regression-type methods in the tensor format holds the promise of leveraging latent low-rank structures enabling both compression and efficient computation. Following this paradigm, we develop novel iterative schemes, involving either explicit and fast or implicit and accurate updates. We demonstrate in a number of examples that our methods achieve a favorable trade-off between accuracy and computational efficiency in comparison with state-of-the-art neural network based approaches.
△ Less
Submitted 17 July, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
VarGrad: A Low-Variance Gradient Estimator for Variational Inference
Authors:
Lorenz Richter,
Ayman Boustati,
Nikolas Nüsken,
Francisco J. R. Ruiz,
Ömer Deniz Akyildiz
Abstract:
We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under…
▽ More
We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.
△ Less
Submitted 29 October, 2020; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space
Authors:
Nikolas Nüsken,
Lorenz Richter
Abstract:
Optimal control of diffusion processes is intimately connected to the problem of solving certain Hamilton-Jacobi-Bellman equations. Building on recent machine learning inspired approaches towards high-dimensional PDEs, we investigate the potential of $\textit{iterative diffusion optimisation}$ techniques, in particular considering applications in importance sampling and rare event simulation, and…
▽ More
Optimal control of diffusion processes is intimately connected to the problem of solving certain Hamilton-Jacobi-Bellman equations. Building on recent machine learning inspired approaches towards high-dimensional PDEs, we investigate the potential of $\textit{iterative diffusion optimisation}$ techniques, in particular considering applications in importance sampling and rare event simulation, and focusing on problems without diffusion control, with linearly controlled drift and running costs that depend quadratically on the control. More generally, our methods apply to nonlinear parabolic PDEs with a certain shift invariance. The choice of an appropriate loss function being a central element in the algorithmic design, we develop a principled framework based on divergences between path measures, encompassing various existing methods. Motivated by connections to forward-backward SDEs, we propose and study the novel $\textit{log-variance}$ divergence, showing favourable properties of corresponding Monte Carlo estimators. The promise of the developed approach is exemplified by a range of high-dimensional and metastable numerical examples.
△ Less
Submitted 29 January, 2023; v1 submitted 11 May, 2020;
originally announced May 2020.
-
Affine invariant interacting Langevin dynamics for Bayesian inference
Authors:
Alfredo Garbuno-Inigo,
Nikolas Nüsken,
Sebastian Reich
Abstract:
We propose a computational method (with acronym ALDI) for sampling from a given target distribution based on first-order (overdamped) Langevin dynamics which satisfies the property of affine invariance. The central idea of ALDI is to run an ensemble of particles with their empirical covariance serving as a preconditioner for their underlying Langevin dynamics. ALDI does not require taking the inve…
▽ More
We propose a computational method (with acronym ALDI) for sampling from a given target distribution based on first-order (overdamped) Langevin dynamics which satisfies the property of affine invariance. The central idea of ALDI is to run an ensemble of particles with their empirical covariance serving as a preconditioner for their underlying Langevin dynamics. ALDI does not require taking the inverse or square root of the empirical covariance matrix, which enables application to high-dimensional sampling problems. The theoretical properties of ALDI are studied in terms of non-degeneracy and ergodicity. Furthermore, we study its connections to diffusion on Riemannian manifolds and Wasserstein gradient flows.
Bayesian inference serves as a main application area for ALDI. In case of a forward problem with additive Gaussian measurement errors, ALDI allows for a gradient-free approximation in the spirit of the ensemble Kalman filter. A computational comparison between gradient-free and gradient-based ALDI is provided for a PDE constrained Bayesian inverse problem.
△ Less
Submitted 9 April, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
On the geometry of Stein variational gradient descent
Authors:
A. Duncan,
N. Nuesken,
L. Szpruch
Abstract:
Bayesian inference problems require sampling or approximating high-dimensional probability distributions. The focus of this paper is on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterated steepest descent steps with respect to a reproducing kernel Hilbert space norm. This construction leads to interacting particle systems, the mean-fi…
▽ More
Bayesian inference problems require sampling or approximating high-dimensional probability distributions. The focus of this paper is on the recently introduced Stein variational gradient descent methodology, a class of algorithms that rely on iterated steepest descent steps with respect to a reproducing kernel Hilbert space norm. This construction leads to interacting particle systems, the mean-field limit of which is a gradient flow on the space of probability distributions equipped with a certain geometrical structure. We leverage this viewpoint to shed some light on the convergence properties of the algorithm, in particular addressing the problem of choosing a suitable positive definite kernel function. Our analysis leads us to considering certain nondifferentiable kernels with adjusted tails. We demonstrate significant performance gains of these in various numerical experiments.
△ Less
Submitted 12 February, 2023; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Note on Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler by Garbuno-Inigo, Hoffmann, Li and Stuart
Authors:
Nikolas Nüsken,
Sebastian Reich
Abstract:
An interacting system of Langevin dynamics driven particles has been proposed for sampling from a given posterior density by Garbuno-Inigo, Hoffmann, Li and Stuart in Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler (arXiv:1903:08866v2). The proposed formulation is primarily studied from a formal mean-field limit perspective, while the theoretical behaviour under a f…
▽ More
An interacting system of Langevin dynamics driven particles has been proposed for sampling from a given posterior density by Garbuno-Inigo, Hoffmann, Li and Stuart in Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler (arXiv:1903:08866v2). The proposed formulation is primarily studied from a formal mean-field limit perspective, while the theoretical behaviour under a finite particle size is left as an open problem. In this note we demonstrate that the particle-based covariance interaction term requires a non-trivial correction. We also show that the corrected dynamics samples exactly from the desired posterior provided that the empirical covariance matrix of the particle system remains non-singular and the posterior log-density satisfies the standard Bakry-Emery criterion.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
State and Parameter Estimation from Observed Signal Increments
Authors:
Nikolas Nüsken,
Sebastian Reich,
Paul J. Rozdeba
Abstract:
The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which mov…
▽ More
The success of the ensemble Kalman filter has triggered a strong interest in expanding its scope beyond classical state estimation problems. In this paper, we focus on continuous-time data assimilation where the model and measurement errors are correlated and both states and parameters need to be identified. Such scenarios arise from noisy and partial observations of Lagrangian particles which move under a stochastic velocity field involving unknown parameters. We take an appropriate class of McKean-Vlasov equations as the starting point to derive ensemble Kalman-Bucy filter algorithms for combined state and parameter estimation. We demonstrate their performance through a series of increasingly complex multi-scale model systems.
△ Less
Submitted 1 May, 2019; v1 submitted 26 March, 2019;
originally announced March 2019.
-
Constructing sampling schemes via coupling: Markov semigroups and optimal transport
Authors:
N. Nuesken,
G. A. Pavliotis
Abstract:
In this paper we develop a general framework for constructing and analysing coupled Markov chain Monte Carlo samplers, allowing for both (possibly degenerate) diffusion and piecewise deterministic Markov processes. For many performance criteria of interest, including the asymptotic variance, the task of finding efficient couplings can be phrased in terms of problems related to optimal transport th…
▽ More
In this paper we develop a general framework for constructing and analysing coupled Markov chain Monte Carlo samplers, allowing for both (possibly degenerate) diffusion and piecewise deterministic Markov processes. For many performance criteria of interest, including the asymptotic variance, the task of finding efficient couplings can be phrased in terms of problems related to optimal transport theory. We investigate general structural properties, proving a singularity theorem that has both geometric and probabilistic interpretations. Moreover, we show that those problems can often be solved approximately and support our findings with numerical experiments. For the particular objective of estimating the variance of a Bayesian posterior, our analysis suggests using novel techniques in the spirit of antithetic variates. Addressing the convergence to equilibrium of coupled processes we furthermore derive a modified Poincaré inequality.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Using Perturbed Underdamped Langevin Dynamics to Efficiently Sample from Probability Distributions
Authors:
A. B. Duncan,
N. Nuesken,
G. A. Pavliotis
Abstract:
In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance…
▽ More
In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance. We present a detailed analysis of the new Langevin sampler for Gaussian target distributions. Our theoretical results are supported by numerical experiments with non-Gaussian target measures.
△ Less
Submitted 29 April, 2017;
originally announced May 2017.