-
Long-time accuracy of ensemble Kalman filters for chaotic and machine-learned dynamical systems
Authors:
Daniel Sanz-Alonso,
Nathan Waniorek
Abstract:
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains s…
▽ More
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains small in the long-time horizon. Our theory covers a wide class of partially-observed chaotic dynamical systems, which includes the Navier-Stokes equations and Lorenz models. In addition, we prove long-time accuracy of ensemble Kalman filters with surrogate dynamics, thus validating the use of machine-learned forecast models in ensemble data assimilation.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Inverse Problems and Data Assimilation: A Machine Learning Approach
Authors:
Eviatar Bach,
Ricardo Baptista,
Daniel Sanz-Alonso,
Andrew Stuart
Abstract:
The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct math…
▽ More
The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct mathematical treatment of various topics in machine learning.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
A First Course in Monte Carlo Methods
Authors:
Daniel Sanz-Alonso,
Omar Al-Ghattas
Abstract:
This is a concise mathematical introduction to Monte Carlo methods, a rich family of algorithms with far-reaching applications in science and engineering. Monte Carlo methods are an exciting subject for mathematical statisticians and computational and applied mathematicians: the design and analysis of modern algorithms are rooted in a broad mathematical toolbox that includes ergodic theory of Mark…
▽ More
This is a concise mathematical introduction to Monte Carlo methods, a rich family of algorithms with far-reaching applications in science and engineering. Monte Carlo methods are an exciting subject for mathematical statisticians and computational and applied mathematicians: the design and analysis of modern algorithms are rooted in a broad mathematical toolbox that includes ergodic theory of Markov chains, Hamiltonian dynamical systems, transport maps, stochastic differential equations, information theory, optimization, Riemannian geometry, and gradient flows, among many others. These lecture notes celebrate the breadth of mathematical ideas that have led to tangible advancements in Monte Carlo methods and their applications. To accommodate a diverse audience, the level of mathematical rigor varies from chapter to chapter, giving only an intuitive treatment to the most technically demanding subjects. The aim is not to be comprehensive or encyclopedic, but rather to illustrate some key principles in the design and analysis of Monte Carlo methods through a carefully-crafted choice of topics that emphasizes timeless over timely ideas. Algorithms are presented in a way that is conducive to conceptual understanding and mathematical analysis -- clarity and intuition are favored over state-of-the-art implementations that are harder to comprehend or rely on ad-hoc heuristics. To help readers navigate the expansive landscape of Monte Carlo methods, each algorithm is accompanied by a summary of its pros and cons, and by a discussion of the type of problems for which they are most useful. The presentation is self-contained, and therefore adequate for self-guided learning or as a teaching resource. Each chapter contains a section with bibliographic remarks that will be useful for those interested in conducting research on Monte Carlo methods and their applications.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet
Authors:
Melissa Adrian,
Daniel Sanz-Alonso,
Rebecca Willett
Abstract:
Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and…
▽ More
Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.
△ Less
Submitted 10 February, 2025; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration
Authors:
Hwanwoo Kim,
Daniel Sanz-Alonso
Abstract:
This paper proposes novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. The new algorithms retain the ease of implementation of the classical GP-UCB algorithm, but the additional random exploration step accelerates their convergence, nearly achieving the optimal convergence rate. Furthermore, to faci…
▽ More
This paper proposes novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. The new algorithms retain the ease of implementation of the classical GP-UCB algorithm, but the additional random exploration step accelerates their convergence, nearly achieving the optimal convergence rate. Furthermore, to facilitate Bayesian inference with an intractable likelihood, we propose to utilize optimization iterates for maximum a posteriori estimation to build a Gaussian process surrogate model for the unnormalized log-posterior density. We provide bounds for the Hellinger distance between the true and the approximate posterior distributions in terms of the number of design points. We demonstrate the effectiveness of our Bayesian optimization algorithms in non-convex benchmark objective functions, in a machine learning hyperparameter tuning problem, and in a black-box engineering design problem. The effectiveness of our posterior approximation approach is demonstrated in two Bayesian inference problems for parameters of dynamical systems.
△ Less
Submitted 17 July, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Gaussian Process Regression under Computational and Epistemic Misspecification
Authors:
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
Gaussian process regression is a classical kernel method for function estimation and data interpolation. In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the interpolation error. We introduce a unified framework to analyze Gaussian process regression under import…
▽ More
Gaussian process regression is a classical kernel method for function estimation and data interpolation. In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the interpolation error. We introduce a unified framework to analyze Gaussian process regression under important classes of computational misspecification: Karhunen-Loève expansions that result in low-rank kernel approximations, multiscale wavelet expansions that induce sparsity in the covariance matrix, and finite element representations that induce sparsity in the precision matrix. Our theory also accounts for epistemic misspecification in the choice of kernel parameters.
△ Less
Submitted 3 October, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Analysis of a Computational Framework for Bayesian Inverse Problems: Ensemble Kalman Updates and MAP Estimators Under Mesh Refinement
Authors:
Daniel Sanz-Alonso,
Nathan Waniorek
Abstract:
This paper analyzes a popular computational framework to solve infinite-dimensional Bayesian inverse problems, discretizing the prior and the forward model in a finite-dimensional weighted inner product space. We demonstrate the benefit of working on a weighted space by establishing operator-norm bounds for finite element and graph-based discretizations of Matérn-type priors and deconvolution forw…
▽ More
This paper analyzes a popular computational framework to solve infinite-dimensional Bayesian inverse problems, discretizing the prior and the forward model in a finite-dimensional weighted inner product space. We demonstrate the benefit of working on a weighted space by establishing operator-norm bounds for finite element and graph-based discretizations of Matérn-type priors and deconvolution forward models. For linear-Gaussian inverse problems, we develop a general theory to characterize the error in the approximation to the posterior. We also embed the computational framework into ensemble Kalman methods and MAP estimators for nonlinear inverse problems. Our operator-norm bounds for prior discretizations guarantee the scalability and accuracy of these algorithms under mesh refinement.
△ Less
Submitted 20 February, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
From Optimization to Sampling Through Gradient Flows
Authors:
N. Garcia Trillos,
B. Hosseini,
D. Sanz-Alonso
Abstract:
This article overviews how gradient flows, and discretizations thereof, are useful to design and analyze optimization and sampling algorithms. The interplay between optimization, sampling, and gradient flows is an active research area; our goal is to provide an accessible and lively introduction to some core ideas, emphasizing that gradient flows uncover the conceptual unity behind many optimizati…
▽ More
This article overviews how gradient flows, and discretizations thereof, are useful to design and analyze optimization and sampling algorithms. The interplay between optimization, sampling, and gradient flows is an active research area; our goal is to provide an accessible and lively introduction to some core ideas, emphasizing that gradient flows uncover the conceptual unity behind many optimization and sampling algorithms, and that they give a rich mathematical framework for their rigorous analysis.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Reduced-Order Autodifferentiable Ensemble Kalman Filters
Authors:
Yuming Chen,
Daniel Sanz-Alonso,
Rebecca Willett
Abstract:
This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned d…
▽ More
This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned dynamics and decoder are then used within an ensemble Kalman filter to reconstruct and forecast the state. Numerical experiments show that if the state dynamics exhibit a hidden low-dimensional structure, ROAD-EnKFs achieve higher accuracy at lower computational cost compared to existing methods. If such structure is not expressed in the latent state dynamics, ROAD-EnKFs achieve similar accuracy at lower cost, making them a promising approach for surrogate state reconstruction and forecasting.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Optimization on Manifolds via Graph Gaussian Processes
Authors:
Hwanwoo Kim,
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
This paper integrates manifold learning techniques within a \emph{Gaussian process upper confidence bound} algorithm to optimize an objective function on a manifold. Our approach is motivated by applications where a full representation of the manifold is not available and querying the objective is expensive. We rely on a point cloud of manifold samples to define a graph Gaussian process surrogate…
▽ More
This paper integrates manifold learning techniques within a \emph{Gaussian process upper confidence bound} algorithm to optimize an objective function on a manifold. Our approach is motivated by applications where a full representation of the manifold is not available and querying the objective is expensive. We rely on a point cloud of manifold samples to define a graph Gaussian process surrogate model for the objective. Query points are sequentially chosen using the posterior distribution of the surrogate model given all previous queries. We establish regret bounds in terms of the number of queries and the size of the point cloud. Several numerical examples complement the theory and illustrate the performance of our method.
△ Less
Submitted 8 November, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization
Authors:
Omar Al Ghattas,
Daniel Sanz-Alonso
Abstract:
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a sma…
▽ More
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
△ Less
Submitted 5 October, 2023; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Mathematical Foundations of Graph-Based Bayesian Semi-Supervised Learning
Authors:
Nicolas García Trillos,
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-c…
▽ More
In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-consuming include labeling X-rays with a diagnosis, protein sequences with a protein type, texts by their topic, tweets by their sentiment, or videos by their genre. In these and numerous other examples, only a few features may be manually labeled due to cost and time constraints. How can we best propagate label information from a small number of expensive labeled features to a vast number of unlabeled ones? This is the question addressed by semi-supervised learning (SSL).
This article overviews recent foundational developments on graph-based Bayesian SSL, a probabilistic framework for label propagation using similarities between features. SSL is an active research area and a thorough review of the extant literature is beyond the scope of this article. Our focus will be on topics drawn from our own research that illustrate the wide range of mathematical tools and ideas that underlie the rigorous study of the statistical accuracy and computational efficiency of graph-based Bayesian SSL.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Hierarchical Ensemble Kalman Methods with Sparsity-Promoting Generalized Gamma Hyperpriors
Authors:
Hwanwoo Kim,
Daniel Sanz-Alonso,
Alexander Strang
Abstract:
This paper introduces a computational framework to incorporate flexible regularization techniques in ensemble Kalman methods for nonlinear inverse problems. The proposed methodology approximates the maximum a posteriori (MAP) estimate of a hierarchical Bayesian model characterized by a conditionally Gaussian prior and generalized gamma hyperpriors. Suitable choices of hyperparameters yield sparsit…
▽ More
This paper introduces a computational framework to incorporate flexible regularization techniques in ensemble Kalman methods for nonlinear inverse problems. The proposed methodology approximates the maximum a posteriori (MAP) estimate of a hierarchical Bayesian model characterized by a conditionally Gaussian prior and generalized gamma hyperpriors. Suitable choices of hyperparameters yield sparsity-promoting regularization. We propose an iterative algorithm for MAP estimation, which alternates between updating the unknown with an ensemble Kalman method and updating the hyperparameters in the regularization to promote sparsity. The effectiveness of our methodology is demonstrated in several computed examples, including compressed sensing and subsurface flow inverse problems.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
A Variational Inference Approach to Inverse Problems with Gamma Hyperpriors
Authors:
Shiv Agrawal,
Hwanwoo Kim,
Daniel Sanz-Alonso,
Alexander Strang
Abstract:
Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper in…
▽ More
Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors. The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement. In addition, it lends itself naturally to conduct model selection for the choice of hyperparameters. We illustrate the performance of our methodology in several computed examples, including a deconvolution problem and sparse identification of dynamical systems from time series data.
△ Less
Submitted 28 November, 2021; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Finite Element Representations of Gaussian Processes: Balancing Numerical and Statistical Accuracy
Authors:
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
The stochastic partial differential equation approach to Gaussian processes (GPs) represents Matérn GP priors in terms of $n$ finite element basis functions and Gaussian coefficients with sparse precision matrix. Such representations enhance the scalability of GP regression and classification to datasets of large size $N$ by setting $n\approx N$ and exploiting sparsity. In this paper we reconsider…
▽ More
The stochastic partial differential equation approach to Gaussian processes (GPs) represents Matérn GP priors in terms of $n$ finite element basis functions and Gaussian coefficients with sparse precision matrix. Such representations enhance the scalability of GP regression and classification to datasets of large size $N$ by setting $n\approx N$ and exploiting sparsity. In this paper we reconsider the standard choice $n \approx N$ through an analysis of the estimation performance. Our theory implies that, under certain smoothness assumptions, one can reduce the computation and memory cost without hindering the estimation accuracy by setting $n \ll N$ in the large $N$ asymptotics. Numerical experiments illustrate the applicability of our theory and the effect of the prior lengthscale in the pre-asymptotic regime.
△ Less
Submitted 8 April, 2022; v1 submitted 6 September, 2021;
originally announced September 2021.
-
Auto-differentiable Ensemble Kalman Filters
Authors:
Yuming Chen,
Daniel Sanz-Alonso,
Rebecca Willett
Abstract:
Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-different…
▽ More
Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters (AD-EnKFs) blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, AD-EnKFs leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Numerical results using the Lorenz-96 model show that AD-EnKFs outperform existing methods that use expectation-maximization or particle filters to merge data assimilation and machine learning. In addition, AD-EnKFs are easy to implement and require minimal tuning.
△ Less
Submitted 19 July, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Graph-based Prior and Forward Models for Inverse Problems on Manifolds with Boundaries
Authors:
John Harlim,
Shixiao Jiang,
Hwanwoo Kim,
Daniel Sanz-Alonso
Abstract:
This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirichlet boundary conditions. We also investigate the g…
▽ More
This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirichlet boundary conditions. We also investigate the graph-based approximation of forward models from PDE parameters to observed quantities. In the construction of graph-based prior and forward models, we leverage the ghost point diffusion map algorithm to approximate second-order elliptic operators with classical boundary conditions. Numerical results validate our graph-based approach and demonstrate the need to design prior covariance models that account for boundary conditions.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Bayesian Update with Importance Sampling: Required Sample Size
Authors:
Daniel Sanz-Alonso,
Zijian Wang
Abstract:
Importance sampling is used to approximate Bayes' rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the $χ^2$-divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles t…
▽ More
Importance sampling is used to approximate Bayes' rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the $χ^2$-divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles that dimension, noise-level and other model parameters play in approximating the Bayesian update with importance sampling. Our examples also facilitate a new direct comparison of standard and optimal proposals for particle filtering.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective
Authors:
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification.
In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification.
△ Less
Submitted 12 June, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
The SPDE Approach to Matérn Fields: Graph Representations
Authors:
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
This paper investigates Gaussian Markov random field approximations to nonstationary Gaussian fields using graph representations of stochastic partial differential equations. We establish approximation error guarantees building on the theory of spectral convergence of graph Laplacians. The proposed graph representations provide a generalization of the Matérn model to unstructured point clouds, and…
▽ More
This paper investigates Gaussian Markov random field approximations to nonstationary Gaussian fields using graph representations of stochastic partial differential equations. We establish approximation error guarantees building on the theory of spectral convergence of graph Laplacians. The proposed graph representations provide a generalization of the Matérn model to unstructured point clouds, and facilitate inference and sampling using linear algebra methods for sparse matrices. In addition, they bridge and unify several models in Bayesian inverse problems, spatial statistics and graph-based machine learning. We demonstrate through examples in these three disciplines that the unity revealed by graph representations facilitates the exchange of ideas across them.
△ Less
Submitted 26 April, 2021; v1 submitted 16 April, 2020;
originally announced April 2020.
-
Data-Driven Forward Discretizations for Bayesian Inversion
Authors:
Daniele Bigoni,
Yuming Chen,
Nicolas Garcia Trillos,
Youssef Marzouk,
Daniel Sanz-Alonso
Abstract:
This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geoscienc…
▽ More
This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geosciences, the observations contain useful information to guide the choice of discretization.
△ Less
Submitted 21 August, 2020; v1 submitted 17 March, 2020;
originally announced March 2020.
-
HMC: avoiding rejections by not using leapfrog and some results on the acceptance rate
Authors:
M. P. Calvo,
D. Sanz-Alonso,
J. M. Sanz-Serna
Abstract:
The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in t…
▽ More
The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in the number of accepted proposals is not achieved by impairing any positive features of the sampling. We also establish new non-asymptotic and asymptotic results on the monotonic relationship between the expected acceptance rate and the expected energy error. These results further validate the derivation of one of the integrators we consider and are of independent interest.
△ Less
Submitted 2 April, 2021; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis
Authors:
Nicolas Garcia Trillos,
Daniel Sanz-Alonso,
Ruiyi Yang
Abstract:
Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relatio…
▽ More
Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relationship over noisy data points, and what is the resulting impact of the choice of similarity in the extraction of global geometric information from the underlying manifold. We provide concrete mathematical evidence that using a local regularization of the noisy data to define the similarity improves the approximation of the hidden Euclidean distance between unperturbed points. Furthermore, graph-based objects constructed with the locally regularized similarity function satisfy better error bounds in their recovery of global geometric ones. Our theory is supported by numerical experiments that demonstrate that the gain in geometric understanding facilitated by local regularization translates into a gain in classification accuracy in simulated and real data.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning
Authors:
Nicolas Garcia Trillos,
Zach Kaplan,
Daniel Sanz-Alonso
Abstract:
The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of…
▽ More
The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Under this unified light, the optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows to replace traditional back-propagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Inverse Problems and Data Assimilation
Authors:
Daniel Sanz-Alonso,
Andrew M. Stuart,
Armeen Taeb
Abstract:
We provide a clear and concise introduction to the subjects of inverse problems and data assimilation, and their inter-relations. The first part of our notes covers inverse problems; this refers to the study of how to estimate unknown model parameters from data. The second part of our notes covers data assimilation; this refers to a particular class of inverse problems in which the unknown paramet…
▽ More
We provide a clear and concise introduction to the subjects of inverse problems and data assimilation, and their inter-relations. The first part of our notes covers inverse problems; this refers to the study of how to estimate unknown model parameters from data. The second part of our notes covers data assimilation; this refers to a particular class of inverse problems in which the unknown parameter is the initial condition (and/or state) of a dynamical system, and the data comprises partial and noisy observations of the state. The third and final part of our notes describes the use of data assimilation methods to solve generic inverse problems by introducing an artificial algorithmic time. Our notes cover, among other topics, maximum a posteriori estimation, (stochastic) gradient descent, variational Bayes, Monte Carlo, importance sampling and Markov chain Monte Carlo for inverse problems; and 3DVAR, 4DVAR, extended and ensemble Kalman filters, and particle filters for data assimilation.
Each of parts one and two starts with a chapter on the Bayesian formulation, in which the problem solution is given by a posterior distribution on the unknown parameter. Then the following chapter specializes the Bayesian formulation to a linear-Gaussian setting where explicit characterization of the posterior is possible and insightful. The next two chapters explore methods to extract information from the posterior in nonlinear and non-Gaussian settings using optimization and Gaussian approximations. The final two chapters describe sampling methods that can reproduce the full posterior in the large sample limit. Each chapter closes with a bibliography containing citations to alternative pedagogical literature and to relevant research literature. We also include a set of exercises at the end of parts one and two. Our notes are thus useful for both classroom teaching and self-guided study.
△ Less
Submitted 14 February, 2023; v1 submitted 15 October, 2018;
originally announced October 2018.
-
On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms
Authors:
Nicolas Garcia Trillos,
Zachary Kaplan,
Thabo Samakhoana,
Daniel Sanz-Alonso
Abstract:
A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that the…
▽ More
A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that these consistency results have profound algorithmic implications. When consistency holds, carefully designed graph-based Markov chain Monte Carlo algorithms are proved to have a uniform spectral gap, independent of the number of unlabeled inputs. Several numerical experiments corroborate both the statistical consistency and the algorithmic scalability established by the theory.
△ Less
Submitted 12 January, 2020; v1 submitted 20 October, 2017;
originally announced October 2017.
-
Continuum Limit of Posteriors in Graph Bayesian Inverse Problems
Authors:
Nicolas Garcia Trillos,
Daniel Sanz-Alonso
Abstract:
We consider the problem of recovering a function input of a differential equation formulated on an unknown domain $M$. We assume to have access to a discrete domain $M_n=\{x_1, \dots, x_n\} \subset M$, and to noisy measurements of the output solution at $p\le n$ of those points. We introduce a graph-based Bayesian inverse problem, and show that the graph-posterior measures over functions in $M_n$…
▽ More
We consider the problem of recovering a function input of a differential equation formulated on an unknown domain $M$. We assume to have access to a discrete domain $M_n=\{x_1, \dots, x_n\} \subset M$, and to noisy measurements of the output solution at $p\le n$ of those points. We introduce a graph-based Bayesian inverse problem, and show that the graph-posterior measures over functions in $M_n$ converge, in the large $n$ limit, to a posterior over functions in $M$ that solves a Bayesian inverse problem with known domain.
The proofs rely on the variational formulation of the Bayesian update, and on a new topology for the study of convergence of measures over functions on point clouds to a measure over functions on the continuum. Our framework, techniques, and results may serve to lay the foundations of robust uncertainty quantification of graph-based tasks in machine learning. The ideas are presented in the concrete setting of recovering the initial condition of the heat equation on an unknown manifold.
△ Less
Submitted 22 June, 2017;
originally announced June 2017.
-
The Bayesian update: variational formulations and gradient flows
Authors:
Nicolas Garcia Trillos,
Daniel Sanz-Alonso
Abstract:
The Bayesian update can be viewed as a variational problem by characterizing the posterior as the minimizer of a functional. The variational viewpoint is far from new and is at the heart of popular methods for posterior approximation. However, some of its consequences seem largely unexplored. We focus on the following one: defining the posterior as the minimizer of a functional gives a natural pat…
▽ More
The Bayesian update can be viewed as a variational problem by characterizing the posterior as the minimizer of a functional. The variational viewpoint is far from new and is at the heart of popular methods for posterior approximation. However, some of its consequences seem largely unexplored. We focus on the following one: defining the posterior as the minimizer of a functional gives a natural path towards the posterior by moving in the direction of steepest descent of the functional. This idea is made precise through the theory of gradient flows, allowing to bring new tools to the study of Bayesian models and algorithms. Since the posterior may be characterized as the minimizer of different functionals, several variational formulations may be considered. We study three of them and their three associated gradient flows. We show that, in all cases, the rate of convergence of the flows to the posterior can be bounded by the geodesic convexity of the functional to be minimized. Each gradient flow naturally suggests a nonlinear diffusion with the posterior as invariant distribution. These diffusions may be discretized to build proposals for Markov chain Monte Carlo (MCMC) algorithms. By construction, the diffusions are guaranteed to satisfy a certain optimality condition, and rates of convergence are given by the convexity of the functionals. We use this observation to propose a criterion for the choice of metric in Riemannian MCMC methods.
△ Less
Submitted 1 November, 2018; v1 submitted 20 May, 2017;
originally announced May 2017.
-
Importance Sampling and Necessary Sample Size: an Information Theory Approach
Authors:
Daniel Sanz-Alonso
Abstract:
Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both measures. We derive a general bound that needs to hold for importance sampling to be successful, and relates the $f$-divergence between the target and the proposal to…
▽ More
Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both measures. We derive a general bound that needs to hold for importance sampling to be successful, and relates the $f$-divergence between the target and the proposal to the sample size. The bound is deduced from a new and simple information theory paradigm for the study of importance sampling. As examples of the general theory we give necessary conditions on the sample size in terms of the Kullback-Leibler and $χ^2$ divergences, and the total variation and Hellinger distances. Our approach is non-asymptotic, and its generality allows to tell apart the relative merits of these metrics. Unsurprisingly, the non-symmetric divergences give sharper bounds than total variation or Hellinger. Our results extend existing necessary conditions -and complement sufficient ones- on the sample size required for importance sampling.
△ Less
Submitted 31 August, 2016;
originally announced August 2016.
-
Importance Sampling: Intrinsic Dimension and Computational Cost
Authors:
S. Agapiou,
O. Papaspiliopoulos,
D. Sanz-Alonso,
A. M. Stuart
Abstract:
The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major…
▽ More
The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering.
△ Less
Submitted 14 January, 2017; v1 submitted 19 November, 2015;
originally announced November 2015.