Search | arXiv e-print repository

Long-time accuracy of ensemble Kalman filters for chaotic and machine-learned dynamical systems

Authors: Daniel Sanz-Alonso, Nathan Waniorek

Abstract: Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains s… ▽ More Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains small in the long-time horizon. Our theory covers a wide class of partially-observed chaotic dynamical systems, which includes the Navier-Stokes equations and Lorenz models. In addition, we prove long-time accuracy of ensemble Kalman filters with surrogate dynamics, thus validating the use of machine-learned forecast models in ensemble data assimilation. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 40 pages, 4 figures

MSC Class: 62F15; 68Q25; 60G35; 62M05

arXiv:2410.10523 [pdf, other]

Inverse Problems and Data Assimilation: A Machine Learning Approach

Authors: Eviatar Bach, Ricardo Baptista, Daniel Sanz-Alonso, Andrew Stuart

Abstract: The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct math… ▽ More The aim of these notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product, we include a succinct mathematical treatment of various topics in machine learning. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: 254 pages

arXiv:2405.16359 [pdf, other]

A First Course in Monte Carlo Methods

Authors: Daniel Sanz-Alonso, Omar Al-Ghattas

Abstract: This is a concise mathematical introduction to Monte Carlo methods, a rich family of algorithms with far-reaching applications in science and engineering. Monte Carlo methods are an exciting subject for mathematical statisticians and computational and applied mathematicians: the design and analysis of modern algorithms are rooted in a broad mathematical toolbox that includes ergodic theory of Mark… ▽ More This is a concise mathematical introduction to Monte Carlo methods, a rich family of algorithms with far-reaching applications in science and engineering. Monte Carlo methods are an exciting subject for mathematical statisticians and computational and applied mathematicians: the design and analysis of modern algorithms are rooted in a broad mathematical toolbox that includes ergodic theory of Markov chains, Hamiltonian dynamical systems, transport maps, stochastic differential equations, information theory, optimization, Riemannian geometry, and gradient flows, among many others. These lecture notes celebrate the breadth of mathematical ideas that have led to tangible advancements in Monte Carlo methods and their applications. To accommodate a diverse audience, the level of mathematical rigor varies from chapter to chapter, giving only an intuitive treatment to the most technically demanding subjects. The aim is not to be comprehensive or encyclopedic, but rather to illustrate some key principles in the design and analysis of Monte Carlo methods through a carefully-crafted choice of topics that emphasizes timeless over timely ideas. Algorithms are presented in a way that is conducive to conceptual understanding and mathematical analysis -- clarity and intuition are favored over state-of-the-art implementations that are harder to comprehend or rely on ad-hoc heuristics. To help readers navigate the expansive landscape of Monte Carlo methods, each algorithm is accompanied by a summary of its pros and cons, and by a discussion of the type of problems for which they are most useful. The presentation is self-contained, and therefore adequate for self-guided learning or as a teaching resource. Each chapter contains a section with bibliographic remarks that will be useful for those interested in conducting research on Monte Carlo methods and their applications. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 150 pages, 21 figures

arXiv:2405.13180 [pdf, other]

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

Authors: Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett

Abstract: Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and… ▽ More Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction. △ Less

Submitted 10 February, 2025; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2401.17037 [pdf, other]

Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration

Authors: Hwanwoo Kim, Daniel Sanz-Alonso

Abstract: This paper proposes novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. The new algorithms retain the ease of implementation of the classical GP-UCB algorithm, but the additional random exploration step accelerates their convergence, nearly achieving the optimal convergence rate. Furthermore, to faci… ▽ More This paper proposes novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models. The new algorithms retain the ease of implementation of the classical GP-UCB algorithm, but the additional random exploration step accelerates their convergence, nearly achieving the optimal convergence rate. Furthermore, to facilitate Bayesian inference with an intractable likelihood, we propose to utilize optimization iterates for maximum a posteriori estimation to build a Gaussian process surrogate model for the unnormalized log-posterior density. We provide bounds for the Hellinger distance between the true and the approximate posterior distributions in terms of the number of design points. We demonstrate the effectiveness of our Bayesian optimization algorithms in non-convex benchmark objective functions, in a machine learning hyperparameter tuning problem, and in a black-box engineering design problem. The effectiveness of our posterior approximation approach is demonstrated in two Bayesian inference problems for parameters of dynamical systems. △ Less

Submitted 17 July, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2312.09225 [pdf, ps, other]

Gaussian Process Regression under Computational and Epistemic Misspecification

Authors: Daniel Sanz-Alonso, Ruiyi Yang

Abstract: Gaussian process regression is a classical kernel method for function estimation and data interpolation. In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the interpolation error. We introduce a unified framework to analyze Gaussian process regression under import… ▽ More Gaussian process regression is a classical kernel method for function estimation and data interpolation. In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the interpolation error. We introduce a unified framework to analyze Gaussian process regression under important classes of computational misspecification: Karhunen-Loève expansions that result in low-rank kernel approximations, multiscale wavelet expansions that induce sparsity in the covariance matrix, and finite element representations that induce sparsity in the precision matrix. Our theory also accounts for epistemic misspecification in the choice of kernel parameters. △ Less

Submitted 3 October, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2304.09933 [pdf, ps, other]

Analysis of a Computational Framework for Bayesian Inverse Problems: Ensemble Kalman Updates and MAP Estimators Under Mesh Refinement

Authors: Daniel Sanz-Alonso, Nathan Waniorek

Abstract: This paper analyzes a popular computational framework to solve infinite-dimensional Bayesian inverse problems, discretizing the prior and the forward model in a finite-dimensional weighted inner product space. We demonstrate the benefit of working on a weighted space by establishing operator-norm bounds for finite element and graph-based discretizations of Matérn-type priors and deconvolution forw… ▽ More This paper analyzes a popular computational framework to solve infinite-dimensional Bayesian inverse problems, discretizing the prior and the forward model in a finite-dimensional weighted inner product space. We demonstrate the benefit of working on a weighted space by establishing operator-norm bounds for finite element and graph-based discretizations of Matérn-type priors and deconvolution forward models. For linear-Gaussian inverse problems, we develop a general theory to characterize the error in the approximation to the posterior. We also embed the computational framework into ensemble Kalman methods and MAP estimators for nonlinear inverse problems. Our operator-norm bounds for prior discretizations guarantee the scalability and accuracy of these algorithms under mesh refinement. △ Less

Submitted 20 February, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 39 pages, 0 figures

MSC Class: 65M32 (Primary) 68Q25; 35Q62 62F15 (Secondary)

arXiv:2302.11449 [pdf, other]

From Optimization to Sampling Through Gradient Flows

Authors: N. Garcia Trillos, B. Hosseini, D. Sanz-Alonso

Abstract: This article overviews how gradient flows, and discretizations thereof, are useful to design and analyze optimization and sampling algorithms. The interplay between optimization, sampling, and gradient flows is an active research area; our goal is to provide an accessible and lively introduction to some core ideas, emphasizing that gradient flows uncover the conceptual unity behind many optimizati… ▽ More This article overviews how gradient flows, and discretizations thereof, are useful to design and analyze optimization and sampling algorithms. The interplay between optimization, sampling, and gradient flows is an active research area; our goal is to provide an accessible and lively introduction to some core ideas, emphasizing that gradient flows uncover the conceptual unity behind many optimization and sampling algorithms, and that they give a rich mathematical framework for their rigorous analysis. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: This article will appear in the Notices of the American Mathematical Society

arXiv:2301.11961 [pdf, other]

Reduced-Order Autodifferentiable Ensemble Kalman Filters

Authors: Yuming Chen, Daniel Sanz-Alonso, Rebecca Willett

Abstract: This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned d… ▽ More This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned dynamics and decoder are then used within an ensemble Kalman filter to reconstruct and forecast the state. Numerical experiments show that if the state dynamics exhibit a hidden low-dimensional structure, ROAD-EnKFs achieve higher accuracy at lower computational cost compared to existing methods. If such structure is not expressed in the latent state dynamics, ROAD-EnKFs achieve similar accuracy at lower cost, making them a promising approach for surrogate state reconstruction and forecasting. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2210.10962 [pdf, other]

Optimization on Manifolds via Graph Gaussian Processes

Authors: Hwanwoo Kim, Daniel Sanz-Alonso, Ruiyi Yang

Abstract: This paper integrates manifold learning techniques within a \emph{Gaussian process upper confidence bound} algorithm to optimize an objective function on a manifold. Our approach is motivated by applications where a full representation of the manifold is not available and querying the objective is expensive. We rely on a point cloud of manifold samples to define a graph Gaussian process surrogate… ▽ More This paper integrates manifold learning techniques within a \emph{Gaussian process upper confidence bound} algorithm to optimize an objective function on a manifold. Our approach is motivated by applications where a full representation of the manifold is not available and querying the objective is expensive. We rely on a point cloud of manifold samples to define a graph Gaussian process surrogate model for the objective. Query points are sequentially chosen using the posterior distribution of the surrogate model given all previous queries. We establish regret bounds in terms of the number of queries and the size of the point cloud. Several numerical examples complement the theory and illustrate the performance of our method. △ Less

Submitted 8 November, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2208.03246 [pdf, ps, other]

doi 10.1093/imaiai/iaad043

Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization

Authors: Omar Al Ghattas, Daniel Sanz-Alonso

Abstract: Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a sma… ▽ More Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest. △ Less

Submitted 5 October, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

arXiv:2207.01093 [pdf, other]

Mathematical Foundations of Graph-Based Bayesian Semi-Supervised Learning

Authors: Nicolas García Trillos, Daniel Sanz-Alonso, Ruiyi Yang

Abstract: In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-c… ▽ More In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-consuming include labeling X-rays with a diagnosis, protein sequences with a protein type, texts by their topic, tweets by their sentiment, or videos by their genre. In these and numerous other examples, only a few features may be manually labeled due to cost and time constraints. How can we best propagate label information from a small number of expensive labeled features to a vast number of unlabeled ones? This is the question addressed by semi-supervised learning (SSL). This article overviews recent foundational developments on graph-based Bayesian SSL, a probabilistic framework for label propagation using similarities between features. SSL is an active research area and a thorough review of the extant literature is beyond the scope of this article. Our focus will be on topics drawn from our own research that illustrate the wide range of mathematical tools and ideas that underlie the rigorous study of the statistical accuracy and computational efficiency of graph-based Bayesian SSL. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: To appear in Notices of the AMS

arXiv:2205.09322 [pdf, other]

Hierarchical Ensemble Kalman Methods with Sparsity-Promoting Generalized Gamma Hyperpriors

Authors: Hwanwoo Kim, Daniel Sanz-Alonso, Alexander Strang

Abstract: This paper introduces a computational framework to incorporate flexible regularization techniques in ensemble Kalman methods for nonlinear inverse problems. The proposed methodology approximates the maximum a posteriori (MAP) estimate of a hierarchical Bayesian model characterized by a conditionally Gaussian prior and generalized gamma hyperpriors. Suitable choices of hyperparameters yield sparsit… ▽ More This paper introduces a computational framework to incorporate flexible regularization techniques in ensemble Kalman methods for nonlinear inverse problems. The proposed methodology approximates the maximum a posteriori (MAP) estimate of a hierarchical Bayesian model characterized by a conditionally Gaussian prior and generalized gamma hyperpriors. Suitable choices of hyperparameters yield sparsity-promoting regularization. We propose an iterative algorithm for MAP estimation, which alternates between updating the unknown with an ensemble Kalman method and updating the hyperparameters in the regularization to promote sparsity. The effectiveness of our methodology is demonstrated in several computed examples, including compressed sensing and subsurface flow inverse problems. △ Less

Submitted 19 May, 2022; originally announced May 2022.

arXiv:2111.13329 [pdf, other]

A Variational Inference Approach to Inverse Problems with Gamma Hyperpriors

Authors: Shiv Agrawal, Hwanwoo Kim, Daniel Sanz-Alonso, Alexander Strang

Abstract: Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper in… ▽ More Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors. The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement. In addition, it lends itself naturally to conduct model selection for the choice of hyperparameters. We illustrate the performance of our methodology in several computed examples, including a deconvolution problem and sparse identification of dynamical systems from time series data. △ Less

Submitted 28 November, 2021; v1 submitted 26 November, 2021; originally announced November 2021.

arXiv:2109.02777 [pdf, other]

Finite Element Representations of Gaussian Processes: Balancing Numerical and Statistical Accuracy

Authors: Daniel Sanz-Alonso, Ruiyi Yang

Abstract: The stochastic partial differential equation approach to Gaussian processes (GPs) represents Matérn GP priors in terms of $n$ finite element basis functions and Gaussian coefficients with sparse precision matrix. Such representations enhance the scalability of GP regression and classification to datasets of large size $N$ by setting $n\approx N$ and exploiting sparsity. In this paper we reconsider… ▽ More The stochastic partial differential equation approach to Gaussian processes (GPs) represents Matérn GP priors in terms of $n$ finite element basis functions and Gaussian coefficients with sparse precision matrix. Such representations enhance the scalability of GP regression and classification to datasets of large size $N$ by setting $n\approx N$ and exploiting sparsity. In this paper we reconsider the standard choice $n \approx N$ through an analysis of the estimation performance. Our theory implies that, under certain smoothness assumptions, one can reduce the computation and memory cost without hindering the estimation accuracy by setting $n \ll N$ in the large $N$ asymptotics. Numerical experiments illustrate the applicability of our theory and the effect of the prior lengthscale in the pre-asymptotic regime. △ Less

Submitted 8 April, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

arXiv:2107.07687 [pdf, other]

Auto-differentiable Ensemble Kalman Filters

Authors: Yuming Chen, Daniel Sanz-Alonso, Rebecca Willett

Abstract: Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-different… ▽ More Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters (AD-EnKFs) blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, AD-EnKFs leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Numerical results using the Lorenz-96 model show that AD-EnKFs outperform existing methods that use expectation-maximization or particle filters to merge data assimilation and machine learning. In addition, AD-EnKFs are easy to implement and require minimal tuning. △ Less

Submitted 19 July, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

arXiv:2106.06787 [pdf, other]

doi 10.1088/1361-6420/ac3994

Graph-based Prior and Forward Models for Inverse Problems on Manifolds with Boundaries

Authors: John Harlim, Shixiao Jiang, Hwanwoo Kim, Daniel Sanz-Alonso

Abstract: This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirichlet boundary conditions. We also investigate the g… ▽ More This paper develops manifold learning techniques for the numerical solution of PDE-constrained Bayesian inverse problems on manifolds with boundaries. We introduce graphical Matérn-type Gaussian field priors that enable flexible modeling near the boundaries, representing boundary values by superposition of harmonic functions with appropriate Dirichlet boundary conditions. We also investigate the graph-based approximation of forward models from PDE parameters to observed quantities. In the construction of graph-based prior and forward models, we leverage the ghost point diffusion map algorithm to approximate second-order elliptic operators with classical boundary conditions. Numerical results validate our graph-based approach and demonstrate the need to design prior covariance models that account for boundary conditions. △ Less

Submitted 12 June, 2021; originally announced June 2021.

arXiv:2009.10831 [pdf, other]

doi 10.3390/e23010022

Bayesian Update with Importance Sampling: Required Sample Size

Authors: Daniel Sanz-Alonso, Zijian Wang

Abstract: Importance sampling is used to approximate Bayes' rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the $χ^2$-divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles t… ▽ More Importance sampling is used to approximate Bayes' rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the $χ^2$-divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles that dimension, noise-level and other model parameters play in approximating the Bayesian update with importance sampling. Our examples also facilitate a new direct comparison of standard and optimal proposals for particle filtering. △ Less

Submitted 22 September, 2020; originally announced September 2020.

MSC Class: 62-08; 62F15; 65C05

arXiv:2008.11809 [pdf, ps, other]

Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective

Authors: Daniel Sanz-Alonso, Ruiyi Yang

Abstract: In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification. In this paper we analyze the graph-based approach to semi-supervised learning under a manifold assumption. We adopt a Bayesian perspective and demonstrate that, for a suitable choice of prior constructed with sufficiently many unlabeled data, the posterior contracts around the truth at a rate that is minimax optimal up to a logarithmic factor. Our theory covers both regression and classification. △ Less

Submitted 12 June, 2021; v1 submitted 26 August, 2020; originally announced August 2020.

arXiv:2004.08000 [pdf, other]

The SPDE Approach to Matérn Fields: Graph Representations

Authors: Daniel Sanz-Alonso, Ruiyi Yang

Abstract: This paper investigates Gaussian Markov random field approximations to nonstationary Gaussian fields using graph representations of stochastic partial differential equations. We establish approximation error guarantees building on the theory of spectral convergence of graph Laplacians. The proposed graph representations provide a generalization of the Matérn model to unstructured point clouds, and… ▽ More This paper investigates Gaussian Markov random field approximations to nonstationary Gaussian fields using graph representations of stochastic partial differential equations. We establish approximation error guarantees building on the theory of spectral convergence of graph Laplacians. The proposed graph representations provide a generalization of the Matérn model to unstructured point clouds, and facilitate inference and sampling using linear algebra methods for sparse matrices. In addition, they bridge and unify several models in Bayesian inverse problems, spatial statistics and graph-based machine learning. We demonstrate through examples in these three disciplines that the unity revealed by graph representations facilitates the exchange of ideas across them. △ Less

Submitted 26 April, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

arXiv:2003.07991 [pdf, other]

doi 10.1088/1361-6420/abb2fa

Data-Driven Forward Discretizations for Bayesian Inversion

Authors: Daniele Bigoni, Yuming Chen, Nicolas Garcia Trillos, Youssef Marzouk, Daniel Sanz-Alonso

Abstract: This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geoscienc… ▽ More This paper suggests a framework for the learning of discretizations of expensive forward models in Bayesian inverse problems. The main idea is to incorporate the parameters governing the discretization as part of the unknown to be estimated within the Bayesian machinery. We numerically show that in a variety of inverse problems arising in mechanical engineering, signal processing and the geosciences, the observations contain useful information to guide the choice of discretization. △ Less

Submitted 21 August, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

arXiv:1912.03253 [pdf, other]

HMC: avoiding rejections by not using leapfrog and some results on the acceptance rate

Authors: M. P. Calvo, D. Sanz-Alonso, J. M. Sanz-Serna

Abstract: The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in t… ▽ More The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in the number of accepted proposals is not achieved by impairing any positive features of the sampling. We also establish new non-asymptotic and asymptotic results on the monotonic relationship between the expected acceptance rate and the expected energy error. These results further validate the derivation of one of the integrators we consider and are of independent interest. △ Less

Submitted 2 April, 2021; v1 submitted 6 December, 2019; originally announced December 2019.

Comments: 37 pages, 8 figures

arXiv:1904.03335 [pdf, other]

Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis

Authors: Nicolas Garcia Trillos, Daniel Sanz-Alonso, Ruiyi Yang

Abstract: Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relatio… ▽ More Several data analysis techniques employ similarity relationships between data points to uncover the intrinsic dimension and geometric structure of the underlying data-generating mechanism. In this paper we work under the model assumption that the data is made of random perturbations of feature vectors lying on a low-dimensional manifold. We study two questions: how to define the similarity relationship over noisy data points, and what is the resulting impact of the choice of similarity in the extraction of global geometric information from the underlying manifold. We provide concrete mathematical evidence that using a local regularization of the noisy data to define the similarity improves the approximation of the hidden Euclidean distance between unperturbed points. Furthermore, graph-based objects constructed with the locally regularized similarity function satisfy better error bounds in their recovery of global geometric ones. Our theory is supported by numerical experiments that demonstrate that the gain in geometric understanding facilitated by local regularization translates into a gain in classification accuracy in simulated and real data. △ Less

Submitted 5 April, 2019; originally announced April 2019.

arXiv:1901.10082 [pdf, other]

doi 10.3390/e21050511

Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning

Authors: Nicolas Garcia Trillos, Zach Kaplan, Daniel Sanz-Alonso

Abstract: The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of… ▽ More The aim of this paper is to provide new theoretical and computational understanding on two loss regularizations employed in deep learning, known as local entropy and heat regularization. For both regularized losses we introduce variational characterizations that naturally suggest a two-step scheme for their optimization, based on the iterative shift of a probability density and the calculation of a best Gaussian approximation in Kullback-Leibler divergence. Under this unified light, the optimization schemes for local entropy and heat regularized loss differ only over which argument of the Kullback-Leibler divergence is used to find the best Gaussian approximation. Local entropy corresponds to minimizing over the second argument, and the solution is given by moment matching. This allows to replace traditional back-propagation calculation of gradients by sampling algorithms, opening an avenue for gradient-free, parallelizable training of neural networks. △ Less

Submitted 28 January, 2019; originally announced January 2019.

arXiv:1810.06191 [pdf, other]

Inverse Problems and Data Assimilation

Authors: Daniel Sanz-Alonso, Andrew M. Stuart, Armeen Taeb

Abstract: We provide a clear and concise introduction to the subjects of inverse problems and data assimilation, and their inter-relations. The first part of our notes covers inverse problems; this refers to the study of how to estimate unknown model parameters from data. The second part of our notes covers data assimilation; this refers to a particular class of inverse problems in which the unknown paramet… ▽ More We provide a clear and concise introduction to the subjects of inverse problems and data assimilation, and their inter-relations. The first part of our notes covers inverse problems; this refers to the study of how to estimate unknown model parameters from data. The second part of our notes covers data assimilation; this refers to a particular class of inverse problems in which the unknown parameter is the initial condition (and/or state) of a dynamical system, and the data comprises partial and noisy observations of the state. The third and final part of our notes describes the use of data assimilation methods to solve generic inverse problems by introducing an artificial algorithmic time. Our notes cover, among other topics, maximum a posteriori estimation, (stochastic) gradient descent, variational Bayes, Monte Carlo, importance sampling and Markov chain Monte Carlo for inverse problems; and 3DVAR, 4DVAR, extended and ensemble Kalman filters, and particle filters for data assimilation. Each of parts one and two starts with a chapter on the Bayesian formulation, in which the problem solution is given by a posterior distribution on the unknown parameter. Then the following chapter specializes the Bayesian formulation to a linear-Gaussian setting where explicit characterization of the posterior is possible and insightful. The next two chapters explore methods to extract information from the posterior in nonlinear and non-Gaussian settings using optimization and Gaussian approximations. The final two chapters describe sampling methods that can reproduce the full posterior in the large sample limit. Each chapter closes with a bibliography containing citations to alternative pedagogical literature and to relevant research literature. We also include a set of exercises at the end of parts one and two. Our notes are thus useful for both classroom teaching and self-guided study. △ Less

Submitted 14 February, 2023; v1 submitted 15 October, 2018; originally announced October 2018.

arXiv:1710.07702 [pdf, other]

On the Consistency of Graph-based Bayesian Learning and the Scalability of Sampling Algorithms

Authors: Nicolas Garcia Trillos, Zachary Kaplan, Thabo Samakhoana, Daniel Sanz-Alonso

Abstract: A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that the… ▽ More A popular approach to semi-supervised learning proceeds by endowing the input data with a graph structure in order to extract geometric information and incorporate it into a Bayesian framework. We introduce new theory that gives appropriate scalings of graph parameters that provably lead to a well-defined limiting posterior as the size of the unlabeled data set grows. Furthermore, we show that these consistency results have profound algorithmic implications. When consistency holds, carefully designed graph-based Markov chain Monte Carlo algorithms are proved to have a uniform spectral gap, independent of the number of unlabeled inputs. Several numerical experiments corroborate both the statistical consistency and the algorithmic scalability established by the theory. △ Less

Submitted 12 January, 2020; v1 submitted 20 October, 2017; originally announced October 2017.

arXiv:1706.07193 [pdf, ps, other]

Continuum Limit of Posteriors in Graph Bayesian Inverse Problems

Authors: Nicolas Garcia Trillos, Daniel Sanz-Alonso

Abstract: We consider the problem of recovering a function input of a differential equation formulated on an unknown domain $M$. We assume to have access to a discrete domain $M_n=\{x_1, \dots, x_n\} \subset M$, and to noisy measurements of the output solution at $p\le n$ of those points. We introduce a graph-based Bayesian inverse problem, and show that the graph-posterior measures over functions in $M_n$… ▽ More We consider the problem of recovering a function input of a differential equation formulated on an unknown domain $M$. We assume to have access to a discrete domain $M_n=\{x_1, \dots, x_n\} \subset M$, and to noisy measurements of the output solution at $p\le n$ of those points. We introduce a graph-based Bayesian inverse problem, and show that the graph-posterior measures over functions in $M_n$ converge, in the large $n$ limit, to a posterior over functions in $M$ that solves a Bayesian inverse problem with known domain. The proofs rely on the variational formulation of the Bayesian update, and on a new topology for the study of convergence of measures over functions on point clouds to a measure over functions on the continuum. Our framework, techniques, and results may serve to lay the foundations of robust uncertainty quantification of graph-based tasks in machine learning. The ideas are presented in the concrete setting of recovering the initial condition of the heat equation on an unknown manifold. △ Less

Submitted 22 June, 2017; originally announced June 2017.

arXiv:1705.07382 [pdf, ps, other]

The Bayesian update: variational formulations and gradient flows

Authors: Nicolas Garcia Trillos, Daniel Sanz-Alonso

Abstract: The Bayesian update can be viewed as a variational problem by characterizing the posterior as the minimizer of a functional. The variational viewpoint is far from new and is at the heart of popular methods for posterior approximation. However, some of its consequences seem largely unexplored. We focus on the following one: defining the posterior as the minimizer of a functional gives a natural pat… ▽ More The Bayesian update can be viewed as a variational problem by characterizing the posterior as the minimizer of a functional. The variational viewpoint is far from new and is at the heart of popular methods for posterior approximation. However, some of its consequences seem largely unexplored. We focus on the following one: defining the posterior as the minimizer of a functional gives a natural path towards the posterior by moving in the direction of steepest descent of the functional. This idea is made precise through the theory of gradient flows, allowing to bring new tools to the study of Bayesian models and algorithms. Since the posterior may be characterized as the minimizer of different functionals, several variational formulations may be considered. We study three of them and their three associated gradient flows. We show that, in all cases, the rate of convergence of the flows to the posterior can be bounded by the geodesic convexity of the functional to be minimized. Each gradient flow naturally suggests a nonlinear diffusion with the posterior as invariant distribution. These diffusions may be discretized to build proposals for Markov chain Monte Carlo (MCMC) algorithms. By construction, the diffusions are guaranteed to satisfy a certain optimality condition, and rates of convergence are given by the convexity of the functionals. We use this observation to propose a criterion for the choice of metric in Riemannian MCMC methods. △ Less

Submitted 1 November, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

arXiv:1608.08814 [pdf, ps, other]

Importance Sampling and Necessary Sample Size: an Information Theory Approach

Authors: Daniel Sanz-Alonso

Abstract: Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both measures. We derive a general bound that needs to hold for importance sampling to be successful, and relates the $f$-divergence between the target and the proposal to… ▽ More Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both measures. We derive a general bound that needs to hold for importance sampling to be successful, and relates the $f$-divergence between the target and the proposal to the sample size. The bound is deduced from a new and simple information theory paradigm for the study of importance sampling. As examples of the general theory we give necessary conditions on the sample size in terms of the Kullback-Leibler and $χ^2$ divergences, and the total variation and Hellinger distances. Our approach is non-asymptotic, and its generality allows to tell apart the relative merits of these metrics. Unsurprisingly, the non-symmetric divergences give sharper bounds than total variation or Hellinger. Our results extend existing necessary conditions -and complement sufficient ones- on the sample size required for importance sampling. △ Less

Submitted 31 August, 2016; originally announced August 2016.

arXiv:1511.06196 [pdf, ps, other]

Importance Sampling: Intrinsic Dimension and Computational Cost

Authors: S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, A. M. Stuart

Abstract: The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major… ▽ More The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee accurate approximations. Intuitively, some notion of distance between the target and the proposal should determine the computational cost of the method. A major challenge is to quantify this distance in terms of parameters or statistics that are pertinent for the practitioner. The subject has attracted substantial interest from within a variety of communities. The objective of this paper is to overview and unify the resulting literature by creating an overarching framework. A general theory is presented, with a focus on the use of importance sampling in Bayesian inverse problems and filtering. △ Less

Submitted 14 January, 2017; v1 submitted 19 November, 2015; originally announced November 2015.

Comments: Statistical Science

Showing 1–30 of 30 results for author: Sanz-Alonso, D