Search | arXiv e-print repository

A Mathematical Perspective On Contrastive Learning

Authors: Ricardo Baptista, Andrew M. Stuart, Son Tran

Abstract: Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent space. In this work, we focus on the bimodal setting and interpret contrastive learning as the optimizati… ▽ More Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each modality, that align representations within a common latent space. In this work, we focus on the bimodal setting and interpret contrastive learning as the optimization of (parameterized) encoders that define conditional probability distributions, for each modality conditioned on the other, consistent with the available data. This provides a framework for multimodal algorithms such as crossmodal retrieval, which identifies the mode of one of these conditional distributions, and crossmodal classification, which is similar to retrieval but includes a fine-tuning step to make it task specific. The framework we adopt also gives rise to crossmodal generative models. This probabilistic perspective suggests two natural generalizations of contrastive learning: the introduction of novel probabilistic loss functions, and the use of alternative metrics for measuring alignment in the common latent space. We study these generalizations of the classical approach in the multivariate Gaussian setting. In this context we view the latent space identification as a low-rank matrix approximation problem. This allows us to characterize the capabilities of loss functions and alignment metrics to approximate natural statistics, such as conditional means and covariances; doing so yields novel variants on contrastive learning algorithms for specific mode-seeking and for generative tasks. The framework we introduce is also studied through numerical experiments on multivariate Gaussians, the labeled MNIST dataset, and on a data assimilation application arising in oceanography. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 44 pages, 15 figures

arXiv:2505.19841 [pdf, ps, other]

Efficient Deconvolution in Populational Inverse Problems

Authors: Arnaud Vadeboncoeur, Mark Girolami, Andrew M. Stuart

Abstract: This work is focussed on the inversion task of inferring the distribution over parameters of interest leading to multiple sets of observations. The potential to solve such distributional inversion problems is driven by increasing availability of data, but a major roadblock is blind deconvolution, arising when the observational noise distribution is unknown. However, when data originates from colle… ▽ More This work is focussed on the inversion task of inferring the distribution over parameters of interest leading to multiple sets of observations. The potential to solve such distributional inversion problems is driven by increasing availability of data, but a major roadblock is blind deconvolution, arising when the observational noise distribution is unknown. However, when data originates from collections of physical systems, a population, it is possible to leverage this information to perform deconvolution. To this end, we propose a methodology leveraging large data sets of observations, collected from different instantiations of the same physical processes, to simultaneously deconvolve the data corrupting noise distribution, and to identify the distribution over model parameters defining the physical processes. A parameter-dependent mathematical model of the physical process is employed. A loss function characterizing the match between the observed data and the output of the mathematical model is defined; it is minimized as a function of the both the parameter inputs to the model of the physics and the parameterized observational noise. This coupled problem is addressed with a modified gradient descent algorithm that leverages specific structure in the noise model. Furthermore, a new active learning scheme is proposed, based on adaptive empirical measures, to train a surrogate model to be accurate in parameter regions of interest; this approach accelerates computation and enables automatic differentiation of black-box, potentially nondifferentiable, code computing parameter-to-solution maps. The proposed methodology is demonstrated on porous medium flow, damped elastodynamics, and simplified models of atmospheric dynamics. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2503.16154 [pdf, ps, other]

Statistical accuracy of the ensemble Kalman filter in the near-linear setting

Authors: E. Calvello, J. A. Carrillo, F. Hoffmann, P. Monmarché, A. M. Stuart, U. Vaes

Abstract: Estimating the state of a dynamical system from partial and noisy observations is a ubiquitous problem in a large number of applications, such as probabilistic weather forecasting and prediction of epidemics. Particle filters are a widely adopted approach to the problem and provide provably accurate approximations of the statistics of the state, but they perform poorly in high dimensions because o… ▽ More Estimating the state of a dynamical system from partial and noisy observations is a ubiquitous problem in a large number of applications, such as probabilistic weather forecasting and prediction of epidemics. Particle filters are a widely adopted approach to the problem and provide provably accurate approximations of the statistics of the state, but they perform poorly in high dimensions because of weight collapse. The ensemble Kalman filter does not suffer from this issue, as it relies on an interacting particle system with equal weights. Despite its wide adoption in the geophysical sciences, mathematical analysis of the accuracy of this filter is predominantly confined to the setting of linear dynamical models and linear observations operators, and analysis beyond the linear Gaussian setting is still in its infancy. In this short note, we provide an accessible overview of recent work in which the authors take first steps to analyze the accuracy of the filter beyond the linear Gaussian setting. △ Less

Submitted 20 March, 2025; originally announced March 2025.

MSC Class: 60G35; 62F15; 65C35; 70F45; 93E11

arXiv:2501.17110 [pdf, other]

Solving Roughly Forced Nonlinear PDEs via Misspecified Kernel Methods and Neural Networks

Authors: Ricardo Baptista, Edoardo Calvello, Matthieu Darcy, Houman Owhadi, Andrew M. Stuart, Xianjin Yang

Abstract: We consider the use of Gaussian Processes (GPs) or Neural Networks (NNs) to numerically approximate the solutions to nonlinear partial differential equations (PDEs) with rough forcing or source terms, which commonly arise as pathwise solutions to stochastic PDEs. Kernel methods have recently been generalized to solve nonlinear PDEs by approximating their solutions as the maximum a posteriori estim… ▽ More We consider the use of Gaussian Processes (GPs) or Neural Networks (NNs) to numerically approximate the solutions to nonlinear partial differential equations (PDEs) with rough forcing or source terms, which commonly arise as pathwise solutions to stochastic PDEs. Kernel methods have recently been generalized to solve nonlinear PDEs by approximating their solutions as the maximum a posteriori estimator of GPs that are conditioned to satisfy the PDE at a finite set of collocation points. The convergence and error guarantees of these methods, however, rely on the PDE being defined in a classical sense and its solution possessing sufficient regularity to belong to the associated reproducing kernel Hilbert space. We propose a generalization of these methods to handle roughly forced nonlinear PDEs while preserving convergence guarantees with an oversmoothing GP kernel that is misspecified relative to the true solution's regularity. This is achieved by conditioning a regular GP to satisfy the PDE with a modified source term in a weak sense (when integrated against a finite number of test functions). This is equivalent to replacing the empirical $L^2$-loss on the PDE constraint by an empirical negative-Sobolev norm. We further show that this loss function can be used to extend physics-informed neural networks (PINNs) to stochastic equations, thereby resulting in a new NN-based variant termed Negative Sobolev Norm-PINN (NeS-PINN). △ Less

Submitted 29 January, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

Comments: 41 pages, 7 figures

arXiv:2501.15785 [pdf, other]

Memorization and Regularization in Generative Diffusion Models

Authors: Ricardo Baptista, Agnimitra Dasgupta, Nikola B. Kovachki, Assad Oberai, Andrew M. Stuart

Abstract: Diffusion models have emerged as a powerful framework for generative modeling. At the heart of the methodology is score matching: learning gradients of families of log-densities for noisy versions of the data distribution at different scales. When the loss function adopted in score matching is evaluated using empirical data, rather than the population loss, the minimizer corresponds to the score o… ▽ More Diffusion models have emerged as a powerful framework for generative modeling. At the heart of the methodology is score matching: learning gradients of families of log-densities for noisy versions of the data distribution at different scales. When the loss function adopted in score matching is evaluated using empirical data, rather than the population loss, the minimizer corresponds to the score of a time-dependent Gaussian mixture. However, use of this analytically tractable minimizer leads to data memorization: in both unconditioned and conditioned settings, the generative model returns the training samples. This paper contains an analysis of the dynamical mechanism underlying memorization. The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer; and, in so doing, lays the foundations for a principled understanding of how to regularize. Numerical experiments investigate the properties of: (i) Tikhonov regularization; (ii) regularization designed to promote asymptotic consistency; and (iii) regularizations induced by under-parameterization of a neural network or by early stopping when training a neural network. These experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted. △ Less

Submitted 18 March, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

Comments: 59 pages, 20 figures

arXiv:2409.09800 [pdf, ps, other]

Accuracy of the Ensemble Kalman Filter in the Near-Linear Setting

Authors: Edoardo Calvello, Pierre Monmarché, Andrew M. Stuart, Urbain Vaes

Abstract: The filtering distribution captures the statistics of the state of a dynamical system from partial and noisy observations. Classical particle filters provably approximate this distribution in quite general settings; however they behave poorly for high dimensional problems, suffering weight collapse. This issue is circumvented by the ensemble Kalman filter which is an equal-weight interacting parti… ▽ More The filtering distribution captures the statistics of the state of a dynamical system from partial and noisy observations. Classical particle filters provably approximate this distribution in quite general settings; however they behave poorly for high dimensional problems, suffering weight collapse. This issue is circumvented by the ensemble Kalman filter which is an equal-weight interacting particle system. However, this finite particle system is only proven to approximate the true filter in the linear Gaussian case. In practice, however, it is applied in much broader settings; as a result, establishing its approximation properties more generally is important. There has been recent progress in the theoretical analysis of the algorithm, establishing stability and error estimates in non-Gaussian settings, but the assumptions on the dynamics and observation models rule out the unbounded vector fields that arise in practice and the analysis applies only to the mean field limit of the ensemble Kalman filter. The present work establishes error bounds between the filtering distribution and the finite particle ensemble Kalman filter when the dynamics and observation vector fields may be unbounded, allowing linear growth. △ Less

Submitted 6 February, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

arXiv:2408.06526 [pdf, other]

doi 10.1137/24M1648703

Operator Learning Using Random Features: A Tool for Scientific Computing

Authors: Nicholas H. Nelsen, Andrew M. Stuart

Abstract: Supervised operator learning centers on the use of training data, in the form of input-output pairs, to estimate maps between infinite-dimensional spaces. It is emerging as a powerful tool to complement traditional scientific computing, which may often be framed in terms of operators mapping between spaces of functions. Building on the classical random features methodology for scalar regression, t… ▽ More Supervised operator learning centers on the use of training data, in the form of input-output pairs, to estimate maps between infinite-dimensional spaces. It is emerging as a powerful tool to complement traditional scientific computing, which may often be framed in terms of operators mapping between spaces of functions. Building on the classical random features methodology for scalar regression, this paper introduces the function-valued random features method. This leads to a supervised operator learning architecture that is practical for nonlinear problems yet is structured enough to facilitate efficient training through the optimization of a convex, quadratic cost. Due to the quadratic structure, the trained model is equipped with convergence guarantees and error and complexity bounds, properties that are not readily available for most other operator learning architectures. At its core, the proposed approach builds a linear combination of random operators. This turns out to be a low-rank approximation of an operator-valued kernel ridge regression algorithm, and hence the method also has strong connections to Gaussian process regression. The paper designs function-valued random features that are tailored to the structure of two nonlinear operator learning benchmark problems arising from parametric partial differential equations. Numerical results demonstrate the scalability, discretization invariance, and transferability of the function-valued random features method. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: 36 pages, 1 table, 9 figures. SIGEST version of SIAM J. Sci. Comput. Vol. 43 No. 5 (2021) pp. A3212-A3243, hence text overlap with arXiv:2005.10224

MSC Class: 68T05; 65D40; 62J07; 62M45; 68W20; 35R60

Journal ref: SIAM Review Vol. 66 No. 3 (2024) pp. 535-571

arXiv:2408.01362 [pdf, other]

Autoencoders in Function Space

Authors: Justin Bunker, Mark Girolami, Hefin Lambley, Andrew M. Stuart, T. J. Sullivan

Abstract: Autoencoders have found widespread application in both their original deterministic form and in their variational formulation (VAEs). In scientific applications and in image processing it is often of interest to consider data that are viewed as functions; while discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional in pract… ▽ More Autoencoders have found widespread application in both their original deterministic form and in their variational formulation (VAEs). In scientific applications and in image processing it is often of interest to consider data that are viewed as functions; while discretisation (of differential equations arising in the sciences) or pixellation (of images) renders problems finite dimensional in practice, conceiving first of algorithms that operate on functions, and only then discretising or pixellating, leads to better algorithms that smoothly operate between resolutions. In this paper function-space versions of the autoencoder (FAE) and variational autoencoder (FVAE) are introduced, analysed, and deployed. Well-definedness of the objective governing VAEs is a subtle issue, particularly in function space, limiting applicability. For the FVAE objective to be well defined requires compatibility of the data distribution with the chosen generative model; this can be achieved, for example, when the data arise from a stochastic differential equation, but is generally restrictive. The FAE objective, on the other hand, is well defined in many situations where FVAE fails to be. Pairing the FVAE and FAE objectives with neural operator architectures that can be evaluated on any mesh enables new applications of autoencoders to inpainting, superresolution, and generative modelling of scientific data. △ Less

Submitted 5 January, 2025; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: 53 pages, 24 figures

MSC Class: 62G07 (Primary) 65M99; 68T07 (Secondary) ACM Class: I.2.6

arXiv:2406.17263 [pdf, other]

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward… ▽ More In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward models; (ii) the potential existence of multiple modes; and (iii) the fact that gradient of, or adjoint solver for, the forward model might not be feasible. While existing Bayesian inference methods meet some of these challenges individually, we propose a framework that tackles all three systematically. Our approach builds upon the Fisher-Rao gradient flow in probability space, yielding a dynamical system for probability densities that converges towards the target distribution at a uniform exponential rate. This rapid convergence is advantageous for the computational burden outlined in (i). We apply Gaussian mixture approximations with operator splitting techniques to simulate the flow numerically; the resulting approximation can capture multiple modes thus addressing (ii). Furthermore, we employ the Kalman methodology to facilitate a derivative-free update of these Gaussian components and their respective weights, addressing the issue in (iii). The proposed methodology results in an efficient derivative-free sampler flexible enough to handle multi-modal distributions: Gaussian Mixture Kalman Inversion (GMKI). The effectiveness of GMKI is demonstrated both theoretically and numerically in several experiments with multimodal target distributions, including proof-of-concept and two-dimensional examples, as well as a large-scale application: recovering the Navier-Stokes initial condition from solution data at positive times. △ Less

Submitted 11 October, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 42 pages, 10 figures

arXiv:2406.06486 [pdf, other]

Continuum Attention for Neural Operators

Authors: Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart

Abstract: Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they a… ▽ More Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.17955 [pdf, other]

Efficient Prior Calibration From Indirect Data

Authors: O. Deniz Akyildiz, Mark Girolami, Andrew M. Stuart, Arnaud Vadeboncoeur

Abstract: Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model mapping the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator mapping an element of the solution sp… ▽ More Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model mapping the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator mapping an element of the solution space to the data space; a noise model describing how noise pollutes the observations; and a prior model describing knowledge about the unknown parameter before the data is acquired. This paper is concerned with learning the prior model from data; in particular, learning the prior from multiple realizations of indirect data obtained through the noisy observation process. The prior is represented, using a generative model, as the pushforward of a Gaussian in a latent space; the pushforward map is learned by minimizing an appropriate loss function. A metric that is well-defined under empirical approximation is used to define the loss function for the pushforward map to make an implementable methodology. Furthermore, an efficient residual-based neural operator approximation of the forward model is proposed and it is shown that this may be learned concurrently with the pushforward map, using a bilevel optimization formulation of the problem; this use of neural operator approximation has the potential to make prior learning from indirect data more computationally efficient, especially when the observation process is expensive, non-smooth or not known. The ideas are illustrated with the Darcy flow inverse problem of finding permeability from piezometric head measurements. △ Less

Submitted 14 May, 2025; v1 submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.13149 [pdf, other]

Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the condition… ▽ More The article presents a systematic study of the problem of conditioning a Gaussian random variable $ξ$ on nonlinear observations of the form $F \circ φ(ξ)$ where $φ: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $ξ\mid F\circ φ(ξ)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.02221 [pdf, other]

Discretization Error of Fourier Neural Operators

Authors: Samuel Lanthaler, Andrew M. Stuart, Margaret Trautner

Abstract: Operator learning is a variant of machine learning that is designed to approximate maps between function spaces from data. The Fourier Neural Operator (FNO) is a common model architecture used for operator learning. The FNO combines pointwise linear and nonlinear operations in physical space with pointwise linear operations in Fourier space, leading to a parameterized map acting between function s… ▽ More Operator learning is a variant of machine learning that is designed to approximate maps between function spaces from data. The Fourier Neural Operator (FNO) is a common model architecture used for operator learning. The FNO combines pointwise linear and nonlinear operations in physical space with pointwise linear operations in Fourier space, leading to a parameterized map acting between function spaces. Although FNOs formally involve convolutions of functions on a continuum, in practice the computations are performed on a discretized grid, allowing efficient implementation via the FFT. In this paper, the aliasing error that results from such a discretization is quantified and algebraic rates of convergence in terms of the grid resolution are obtained as a function of the regularity of the input. Numerical experiments that validate the theory and describe model stability are performed. △ Less

Submitted 3 May, 2024; originally announced May 2024.

MSC Class: 41A35 (Primary) 65T50; 68T07 (Secondary)

arXiv:2403.14934 [pdf, other]

A Stochastic Model-Based Control Methodology for Glycemic Management in the Intensive Care Unit

Authors: Melike Sirlanci, George Hripcsak, Cecilia C. Low Wang, J. N. Stroh, Yanran Wang, Tellen D. Bennett, Andrew M. Stuart, David J. Albers

Abstract: Intensive care unit (ICU) patients exhibit erratic blood glucose (BG) fluctuations, including hypoglycemic and hyperglycemic episodes, and require exogenous insulin delivery to keep their BG in healthy ranges. Glycemic control via glycemic management (GM) is associated with reduced mortality and morbidity in the ICU, but GM increases the cognitive load on clinicians. The availability of robust, ac… ▽ More Intensive care unit (ICU) patients exhibit erratic blood glucose (BG) fluctuations, including hypoglycemic and hyperglycemic episodes, and require exogenous insulin delivery to keep their BG in healthy ranges. Glycemic control via glycemic management (GM) is associated with reduced mortality and morbidity in the ICU, but GM increases the cognitive load on clinicians. The availability of robust, accurate, and actionable clinical decision support (CDS) tools reduces this burden and assists in the decision-making process to improve health outcomes. Clinicians currently follow GM protocol flow charts for patient intravenous insulin delivery rate computations. We present a mechanistic model-based control algorithm that predicts the optimal intravenous insulin rate to keep BG within a target range; the goal is to develop this approach for eventual use within CDS systems. In this control framework, we employed a stochastic model representing BG dynamics in the ICU setting and used the linear quadratic Gaussian control methodology to develop a controller. We designed two experiments, one using virtual (simulated) patients and one using a real-world retrospective dataset. Using these, we evaluate the safety and efficacy of this model-based glycemic control methodology. The presented controller avoids hypoglycemia and hyperglycemia in virtual patients, maintaining BG levels in the target range more consistently than two existing GM protocols. Moreover, this methodology could theoretically prevent a large proportion of hypoglycemic and hyperglycemic events recorded in a real-world retrospective dataset. △ Less

Submitted 3 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 26 pages, 4 figures, 5 tables

MSC Class: 49-11 ACM Class: I.6.3

arXiv:2402.15715 [pdf, other]

Operator Learning: Algorithms and Analysis

Authors: Nikola B. Kovachki, Samuel Lanthaler, Andrew M. Stuart

Abstract: Operator learning refers to the application of ideas from machine learning to approximate (typically nonlinear) operators mapping between Banach spaces of functions. Such operators often arise from physical models expressed in terms of partial differential equations (PDEs). In this context, such approximate operators hold great potential as efficient surrogate models to complement traditional nume… ▽ More Operator learning refers to the application of ideas from machine learning to approximate (typically nonlinear) operators mapping between Banach spaces of functions. Such operators often arise from physical models expressed in terms of partial differential equations (PDEs). In this context, such approximate operators hold great potential as efficient surrogate models to complement traditional numerical methods in many-query tasks. Being data-driven, they also enable model discovery when a mathematical description in terms of a PDE is not available. This review focuses primarily on neural operators, built on the success of deep neural networks in the approximation of functions defined on finite dimensional Euclidean spaces. Empirically, neural operators have shown success in a variety of applications, but our theoretical understanding remains incomplete. This review article summarizes recent progress and the current state of our theoretical understanding of neural operators, focusing on an approximation theoretic point of view. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.01593 [pdf, ps, other]

Statistical Accuracy of Approximate Filtering Methods

Authors: J. A. Carrillo, F. Hoffmann, A. M. Stuart, U. Vaes

Abstract: Estimating the statistics of the state of a dynamical system, from partial and noisy observations, is both mathematically challenging and finds wide application. Furthermore, the applications are of great societal importance, including problems such as probabilistic weather forecasting and prediction of epidemics. Particle filters provide a well-founded approach to the problem, leading to provably… ▽ More Estimating the statistics of the state of a dynamical system, from partial and noisy observations, is both mathematically challenging and finds wide application. Furthermore, the applications are of great societal importance, including problems such as probabilistic weather forecasting and prediction of epidemics. Particle filters provide a well-founded approach to the problem, leading to provably accurate approximations of the statistics. However these methods perform poorly in high dimensions. In 1994 the idea of ensemble Kalman filtering was introduced by Evensen, leading to a methodology that has been widely adopted in the geophysical sciences and also finds application to quite general inverse problems. However, ensemble Kalman filters have defied rigorous analysis of their statistical accuracy, except in the linear Gaussian setting. In this article we describe recent work which takes first steps to analyze the statistical accuracy of ensemble Kalman filters beyond the linear Gaussian setting. The subject is inherently technical, as it involves the evolution of probability measures according to a nonlinear and nonautonomous dynamical system; and the approximation of this evolution. It can nonetheless be presented in a fairly accessible fashion, understandable with basic knowledge of dynamical systems, numerical analysis and probability. △ Less

Submitted 31 May, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: To appear in ICIAM proceedings

MSC Class: 60G35; 62F15; 65C35; 70F45; 93E11

arXiv:2310.14555 [pdf, other]

Modeling groundwater levels in California's Central Valley by hierarchical Gaussian process and neural network regression

Authors: Anshuman Pradhan, Kyra H. Adams, Venkat Chandrasekaran, Zhen Liu, John T. Reager, Andrew M. Stuart, Michael J. Turmon

Abstract: Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. The lack of consistent well data makes it difficult to evaluate the impact of 2017 and 2019 wet years on CV groundwater following a severe drought during 2012-2015. A novel machine learning meth… ▽ More Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. The lack of consistent well data makes it difficult to evaluate the impact of 2017 and 2019 wet years on CV groundwater following a severe drought during 2012-2015. A novel machine learning method is formulated for modeling groundwater levels by learning from a 3D lithological texture model of the CV aquifer. The proposed formulation performs multivariate regression by combining Gaussian processes (GP) and deep neural networks (DNN). The hierarchical modeling approach constitutes training the DNN to learn a lithologically informed latent space where non-parametric regression with GP is performed. We demonstrate the efficacy of GP-DNN regression for modeling non-stationary features in the well data with fast and reliable uncertainty quantification, as validated to be statistically consistent with the empirical data distribution from 90 blind wells across CV. We show how the model predictions may be used to supplement hydrological understanding of aquifer responses in basins with irregular well data. Our results indicate that on average the 2017 and 2019 wet years in California were largely ineffective in replenishing the groundwater loss caused during previous drought years. △ Less

Submitted 11 October, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.03597 [pdf, other]

Sampling via Gradient Flows in the Space of Probability Measures

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

Abstract: Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design com… ▽ More Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically. △ Less

Submitted 9 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Related and text overlap with arXiv:2302.11024

arXiv:2306.15924 [pdf, ps, other]

The Parametric Complexity of Operator Learning

Authors: Samuel Lanthaler, Andrew M. Stuart

Abstract: Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this pape… ▽ More Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a "curse of parametric complexity", which is an infinite-dimensional analogue of the well-known curse of dimensionality encountered in high-dimensional approximation problems. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO.The second contribution of the paper is to prove that this general curse can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of parametric complexity related to the infinite-dimensional input and output function spaces. △ Less

Submitted 9 March, 2025; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.12006 [pdf, other]

Learning Homogenization for Elliptic Operators

Authors: Kaushik Bhattacharya, Nikola Kovachki, Aakila Rajan, Andrew M. Stuart, Margaret Trautner

Abstract: Multiscale partial differential equations (PDEs) arise in various applications, and several schemes have been developed to solve them efficiently. Homogenization theory is a powerful methodology that eliminates the small-scale dependence, resulting in simplified equations that are computationally tractable while accurately predicting the macroscopic response. In the field of continuum mechanics, h… ▽ More Multiscale partial differential equations (PDEs) arise in various applications, and several schemes have been developed to solve them efficiently. Homogenization theory is a powerful methodology that eliminates the small-scale dependence, resulting in simplified equations that are computationally tractable while accurately predicting the macroscopic response. In the field of continuum mechanics, homogenization is crucial for deriving constitutive laws that incorporate microscale physics in order to formulate balance laws for the macroscopic quantities of interest. However, obtaining homogenized constitutive laws is often challenging as they do not in general have an analytic form and can exhibit phenomena not present on the microscale. In response, data-driven learning of the constitutive law has been proposed as appropriate for this task. However, a major challenge in data-driven learning approaches for this problem has remained unexplored: the impact of discontinuities and corner interfaces in the underlying material. These discontinuities in the coefficients affect the smoothness of the solutions of the underlying equations. Given the prevalence of discontinuous materials in continuum mechanics applications, it is important to address the challenge of learning in this context; in particular, to develop underpinning theory that establishes the reliability of data-driven methods in this scientific domain. The paper addresses this unexplored challenge by investigating the learnability of homogenized constitutive laws for elliptic operators in the presence of such complexities. Approximation theory is presented, and numerical experiments are performed which validate the theory in the context of learning the solution operator defined by the cell problem arising in homogenization for elliptic PDEs. △ Less

Submitted 4 January, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

MSC Class: 35B27; 35J47; 74H15

arXiv:2305.04962 [pdf, other]

Error Analysis of Kernel/GP Methods for Nonlinear and Parametric PDEs

Authors: Pau Batlle, Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PD… ▽ More We introduce a priori Sobolev-space error estimates for the solution of nonlinear, and possibly parametric, PDEs using Gaussian process and kernel based methods. The primary assumptions are: (1) a continuous embedding of the reproducing kernel Hilbert space of the kernel into a Sobolev space of sufficient regularity; and (2) the stability of the differential operator and the solution map of the PDE between corresponding Sobolev spaces. The proof is articulated around Sobolev norm error estimates for kernel interpolants and relies on the minimizing norm property of the solution. The error estimates demonstrate dimension-benign convergence rates if the solution space of the PDE is smooth enough. We illustrate these points with applications to high-dimensional nonlinear elliptic PDEs and parametric PDEs. Although some recent machine learning methods have been presented as breaking the curse of dimensionality in solving high-dimensional PDEs, our analysis suggests a more nuanced picture: there is a trade-off between the regularity of the solution and the presence of the curse of dimensionality. Therefore, our results are in line with the understanding that the curse is absent when the solution is regular enough. △ Less

Submitted 8 May, 2023; originally announced May 2023.

MSC Class: 60G15; 65M75; 65N75; 65N35; 47B34; 41A15; 35R30; 34B15

arXiv:2304.13221 [pdf, other]

Nonlocality and Nonlinearity Implies Universality in Operator Learning

Authors: Samuel Lanthaler, Zongyi Li, Andrew M. Stuart

Abstract: Neural operator architectures approximate operators between infinite-dimensional Banach spaces of functions. They are gaining increased attention in computational science and engineering, due to their potential both to accelerate traditional numerical methods and to enable data-driven discovery. As the field is in its infancy basic questions about minimal requirements for universal approximation r… ▽ More Neural operator architectures approximate operators between infinite-dimensional Banach spaces of functions. They are gaining increased attention in computational science and engineering, due to their potential both to accelerate traditional numerical methods and to enable data-driven discovery. As the field is in its infancy basic questions about minimal requirements for universal approximation remain open. It is clear that any general approximation of operators between spaces of functions must be both nonlocal and nonlinear. In this paper we describe how these two attributes may be combined in a simple way to deduce universal approximation. In so doing we unify the analysis of a wide range of neural operator architectures and open up consideration of new ones. A popular variant of neural operators is the Fourier neural operator (FNO). Previous analysis proving universal operator approximation theorems for FNOs resorts to use of an unbounded number of Fourier modes, relying on intuition from traditional analysis of spectral methods. The present work challenges this point of view: (i) the work reduces FNO to its core essence, resulting in a minimal architecture termed the ``averaging neural operator'' (ANO); and (ii) analysis of the ANO shows that even this minimal ANO architecture benefits from universal approximation. This result is obtained based on only a spatial average as its only nonlocal ingredient (corresponding to retaining only a \emph{single} Fourier mode in the special case of the FNO). The analysis paves the way for a more systematic exploration of nonlocality, both through the development of new operator learning architectures and the analysis of existing and new architectures. Numerical results are presented which give insight into complexity issues related to the roles of channel width (embedding dimension) and number of Fourier modes. △ Less

Submitted 14 June, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2302.11024 [pdf, other]

Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance

Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of… ▽ More Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance. △ Less

Submitted 10 September, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 82 pages, 8 figures (Welcome any feedback!)

arXiv:2212.13239 [pdf, ps, other]

The Mean Field Ensemble Kalman Filter: Near-Gaussian Setting

Authors: J. A. Carrillo, F. Hoffmann, A. M. Stuart, U. Vaes

Abstract: The ensemble Kalman filter is widely used in applications because, for high dimensional filtering problems, it has a robustness that is not shared for example by the particle filter; in particular it does not suffer from weight collapse. However, there is no theory which quantifies its accuracy as an approximation of the true filtering distribution, except in the Gaussian setting. To address this… ▽ More The ensemble Kalman filter is widely used in applications because, for high dimensional filtering problems, it has a robustness that is not shared for example by the particle filter; in particular it does not suffer from weight collapse. However, there is no theory which quantifies its accuracy as an approximation of the true filtering distribution, except in the Gaussian setting. To address this issue we provide the first analysis of the accuracy of the ensemble Kalman filter beyond the Gaussian setting. We prove two types of results: the first type comprise a stability estimate controlling the error made by the ensemble Kalman filter in terms of the difference between the true filtering distribution and a nearby Gaussian; and the second type use this stability result to show that, in a neighbourhood of Gaussian problems, the ensemble Kalman filter makes a small error, in comparison with the true filtering distribution. Our analysis is developed for the mean field ensemble Kalman filter. We rewrite the update equations for this filter, and for the true filtering distribution, in terms of maps on probability measures. We introduce a weighted total variation metric to estimate the distance between the two filters and we prove various stability estimates for the maps defining the evolution of the two filters, in this metric. Using these stability estimates we prove results of the first and second types, in the weighted total variation metric. We also provide a generalization of these results to the Gaussian projected filter, which can be viewed as a mean field description of the unscented Kalman filter. △ Less

Submitted 27 August, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

MSC Class: 62F15; 65C35; 93E11; 70F45

arXiv:2210.17443 [pdf, other]

doi 10.1016/j.jmps.2023.105329

Learning macroscopic internal variables and history dependence from microscopic models

Authors: Burigede Liu, Eric Ocegueda, Margaret Trautner, Andrew M. Stuart, Kaushik Bhattacharya

Abstract: This paper concerns the study of history dependent phenomena in heterogeneous materials in a two-scale setting where the material is specified at a fine microscopic scale of heterogeneities that is much smaller than the coarse macroscopic scale of application. We specifically study a polycrystalline medium where each grain is governed by crystal plasticity while the solid is subjected to macroscop… ▽ More This paper concerns the study of history dependent phenomena in heterogeneous materials in a two-scale setting where the material is specified at a fine microscopic scale of heterogeneities that is much smaller than the coarse macroscopic scale of application. We specifically study a polycrystalline medium where each grain is governed by crystal plasticity while the solid is subjected to macroscopic dynamic loads. The theory of homogenization allows us to solve the macroscale problem directly with a constitutive relation that is defined implicitly by the solution of the microscale problem. However, the homogenization leads to a highly complex history dependence at the macroscale, one that can be quite different from that at the microscale. In this paper, we examine the use of machine-learning, and especially deep neural networks, to harness data generated by repeatedly solving the finer scale model to: (i) gain insights into the history dependence and the macroscopic internal variables that govern the overall response; and (ii) to create a computationally efficient surrogate of its solution operator, that can directly be used at the coarser scale with no further modeling. We do so by introducing a recurrent neural operator (RNO), and show that: (i) the architecture and the learned internal variables can provide insight into the physics of the macroscopic problem; and (ii) that the RNO can provide multiscale, specifically FE2, accuracy at a cost comparable to a conventional empirical constitutive relation. △ Less

Submitted 30 April, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2209.11371 [pdf, other]

Ensemble Kalman Methods: A Mean Field Perspective

Authors: Edoardo Calvello, Sebastian Reich, Andrew M. Stuart

Abstract: Ensemble Kalman methods are widely used for state estimation in the geophysical sciences. Their success stems from the fact that they take an underlying (possibly noisy) dynamical system as a black box to provide a systematic, derivative-free methodology for incorporating noisy, partial and possibly indirect observations to update estimates of the state; furthermore the ensemble approach allows fo… ▽ More Ensemble Kalman methods are widely used for state estimation in the geophysical sciences. Their success stems from the fact that they take an underlying (possibly noisy) dynamical system as a black box to provide a systematic, derivative-free methodology for incorporating noisy, partial and possibly indirect observations to update estimates of the state; furthermore the ensemble approach allows for sensitivities and uncertainties to be calculated. The methodology was introduced in 1994 in the context of ocean state estimation. Soon thereafter it was adopted by the numerical weather prediction community and is now a key component of the best weather prediction systems worldwide. Furthermore the methodology is starting to be widely adopted for numerous problems in the geophysical sciences and is being developed as the basis for general purpose derivative-free inversion methods that show great promise. Despite this empirical success, analysis of the accuracy of ensemble Kalman methods, in terms of their capabilities as both state estimators and quantifiers of uncertainty, is lagging. The purpose of this paper is to provide a unifying mean field based framework for the derivation and analysis of ensemble Kalman methods. Both state estimation and parameter estimation problems (inverse problems) are considered, and formulations in both discrete and continuous time are employed. For state estimation problems, both the control and filtering approaches are considered; analogously for parameter estimation problems, the optimization and Bayesian perspectives are both studied. The mean field perspective provides an elegant framework, suitable for analysis; furthermore, a variety of methods used in practice can be derived from mean field systems by using interacting particle system approximations. The approach taken also unifies a wide-ranging literature in the field and suggests open problems. △ Less

Submitted 7 October, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2208.04506 [pdf, other]

Second Order Ensemble Langevin Method for Sampling and Inverse Problems

Authors: Ziming Liu, Andrew M. Stuart, Yixuan Wang

Abstract: We propose a sampling method based on an ensemble approximation of second order Langevin dynamics. The log target density is appended with a quadratic term in an auxiliary momentum variable and damped-driven Hamiltonian dynamics introduced; the resulting stochastic differential equation is invariant to the Gibbs measure, with marginal on the position coordinates given by the target. A precondition… ▽ More We propose a sampling method based on an ensemble approximation of second order Langevin dynamics. The log target density is appended with a quadratic term in an auxiliary momentum variable and damped-driven Hamiltonian dynamics introduced; the resulting stochastic differential equation is invariant to the Gibbs measure, with marginal on the position coordinates given by the target. A preconditioner based on covariance under the law of the dynamics does not change this invariance property, and is introduced to accelerate convergence to the Gibbs measure. The resulting mean-field dynamics may be approximated by an ensemble method; this results in a gradient-free and affine-invariant stochastic dynamical system. Numerical results demonstrate its potential as the basis for a numerical sampler in Bayesian inverse problems. △ Less

Submitted 24 October, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2205.14139 [pdf, other]

Learning Markovian Homogenized Models in Viscoelasticity

Authors: Kaushik Bhattacharya, Burigede Liu, Andrew M. Stuart, Margaret Trautner

Abstract: Fully resolving dynamics of materials with rapidly-varying features involves expensive fine-scale computations which need to be conducted on macroscopic scales. The theory of homogenization provides an approach to derive effective macroscopic equations which eliminates the small scales by exploiting scale separation. An accurate homogenized model avoids the computationally-expensive task of numeri… ▽ More Fully resolving dynamics of materials with rapidly-varying features involves expensive fine-scale computations which need to be conducted on macroscopic scales. The theory of homogenization provides an approach to derive effective macroscopic equations which eliminates the small scales by exploiting scale separation. An accurate homogenized model avoids the computationally-expensive task of numerically solving the underlying balance laws at a fine scale, thereby rendering a numerical solution of the balance laws more computationally tractable. In complex settings, homogenization only defines the constitutive model implicitly, and machine learning can be used to learn the constitutive model explicitly from localized fine-scale simulations. In the case of one-dimensional viscoelasticity, the linearity of the model allows for a complete analysis. We establish that the homogenized constitutive model may be approximated by a recurrent neural network (RNN) that captures the memory. The memory is encapsulated in the evolution of an appropriate finite set of internal variables, discovered through the learning process and dependent on the history of the strain. Simulations are presented which validate the theory. Guidance for the learning of more complex models, such as arise in plasticity, by similar techniques, is given. △ Less

Submitted 4 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

arXiv:2204.04386 [pdf, other]

Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems

Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

Abstract: We consider Bayesian inference for large scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require $O(10^4)$ model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefor… ▽ More We consider Bayesian inference for large scale inverse problems, where computational challenges arise from the need for repeated evaluations of an expensive forward model. This renders most Markov chain Monte Carlo approaches infeasible, since they typically require $O(10^4)$ model runs, or more. Moreover, the forward model is often given as a black box or is impractical to differentiate. Therefore derivative-free algorithms are highly desirable. We propose a framework, which is built on Kalman methodology, to efficiently perform Bayesian inference in such inverse problems. The basic method is based on an approximation of the filtering distribution of a novel mean-field dynamical system into which the inverse problem is embedded as an observation operator. Theoretical properties of the mean-field model are established for linear inverse problems, demonstrating that the desired Bayesian posterior is given by the steady state of the law of the filtering distribution of the mean-field dynamical system, and proving exponential convergence to it. This suggests that, for nonlinear problems which are close to Gaussian, sequentially computing this law provides the basis for efficient iterative methods to approximate the Bayesian posterior. Ensemble methods are applied to obtain interacting particle system approximations of the filtering distribution of the mean-field model; and practical strategies to further reduce the computational and memory cost of the methodology are presented, including low-rank approximation and a bi-fidelity approach. The effectiveness of the framework is demonstrated in several numerical experiments, including proof-of-concept linear/nonlinear examples and two large-scale applications: learning of permeability parameters in subsurface flow; and learning subgrid-scale parameters in a global climate model from time-averaged statistics. △ Less

Submitted 11 August, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

Comments: 44 pages, 15 figures

arXiv:2203.13181 [pdf, other]

The Cost-Accuracy Trade-Off In Operator Learning With Neural Networks

Authors: Maarten V. de Hoop, Daniel Zhengyu Huang, Elizabeth Qian, Andrew M. Stuart

Abstract: The term `surrogate modeling' in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations, such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering, which include iterative methods in optimiz… ▽ More The term `surrogate modeling' in computational science and engineering refers to the development of computationally efficient approximations for expensive simulations, such as those arising from numerical solution of partial differential equations (PDEs). Surrogate modeling is an enabling methodology for many-query computations in science and engineering, which include iterative methods in optimization and sampling methods in uncertainty quantification. Over the last few years, several approaches to surrogate modeling for PDEs using neural networks have emerged, motivated by successes in using neural networks to approximate nonlinear maps in other areas. In principle, the relative merits of these different approaches can be evaluated by understanding, for each one, the cost required to achieve a given level of accuracy. However, the absence of a complete theory of approximation error for these approaches makes it difficult to assess this cost-accuracy trade-off. The purpose of the paper is to provide a careful numerical study of this issue, comparing a variety of different neural network architectures for operator approximation across a range of problems arising from PDE models in continuum mechanics. △ Less

Submitted 11 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: 48 pages, 19 figures

arXiv:2201.06998 [pdf, other]

doi 10.1029/2022MS002997

Ensemble-Based Experimental Design for Targeting Data Acquisition to Inform Climate Models

Authors: Oliver R. A. Dunbar, Michael F. Howland, Tapio Schneider, Andrew M. Stuart

Abstract: Data required to calibrate uncertain GCM parameterizations are often only available in limited regions or time periods, for example, observational data from field campaigns, or data generated in local high-resolution simulations. This raises the question of where and when to acquire additional data to be maximally informative about parameterizations in a GCM. Here we construct a new ensemble-based… ▽ More Data required to calibrate uncertain GCM parameterizations are often only available in limited regions or time periods, for example, observational data from field campaigns, or data generated in local high-resolution simulations. This raises the question of where and when to acquire additional data to be maximally informative about parameterizations in a GCM. Here we construct a new ensemble-based parallel algorithm to automatically target data acquisition to regions and times that maximize the uncertainty reduction, or information gain, about GCM parameters. The algorithm uses a Bayesian framework that exploits a quantified distribution of GCM parameters as a measure of uncertainty. This distribution is informed by time-averaged climate statistics restricted to local regions and times. The algorithm is embedded in the recently developed calibrate-emulate-sample (CES) framework, which performs efficient model calibration and uncertainty quantification with only $\mathcal{O}(10^2)$ model evaluations, compared with $\mathcal{O}(10^5)$ evaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of local data. In this perfect-model setting, we calibrate parameters and quantify uncertainties in a quasi-equilibrium convection scheme in the GCM. We consider targeted data that are (i) localized in space for statistically stationary simulations, and (ii) localized in space and time for seasonally varying simulations. In these proof-of-concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain typically, but not always, results from regions near the intertropical convergence zone (ITCZ). △ Less

Submitted 27 June, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

arXiv:2108.12515 [pdf, other]

doi 10.1137/21M1442942

Convergence Rates for Learning Linear Operators from Noisy Data

Authors: Maarten V. de Hoop, Nikola B. Kovachki, Nicholas H. Nelsen, Andrew M. Stuart

Abstract: This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The training data comprises pairs of random input vectors in a Hilbert space and their noisy images under an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues give… ▽ More This paper studies the learning of linear operators between infinite-dimensional Hilbert spaces. The training data comprises pairs of random input vectors in a Hilbert space and their noisy images under an unknown self-adjoint linear operator. Assuming that the operator is diagonalizable in a known basis, this work solves the equivalent inverse problem of estimating the operator's eigenvalues given the data. Adopting a Bayesian approach, the theoretical analysis establishes posterior contraction rates in the infinite data limit with Gaussian priors that are not directly linked to the forward map of the inverse problem. The main results also include learning-theoretic generalization error guarantees for a wide range of distribution shifts. These convergence rates quantify the effects of data smoothness and true eigenvalue decay or growth, for compact or unbounded operators, respectively, on sample complexity. Numerical evidence supports the theory in diagonal and non-diagonal settings. △ Less

Submitted 2 November, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: To appear in SIAM/ASA Journal on Uncertainty Quantification (JUQ); 34 pages, 5 figures, 2 tables

MSC Class: 62G20; 62C10; 68T05; 47A62

Journal ref: SIAM/ASA J. Uncertainty Quantification Vol. 11 No. 2 (2023) pp. 480-513

arXiv:2107.06658 [pdf, other]

A Framework for Machine Learning of Model Error in Dynamical Systems

Authors: Matthew E. Levine, Andrew M. Stuart

Abstract: The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation i… ▽ More The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. △ Less

Submitted 17 August, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

arXiv:2106.02519 [pdf, other]

Consensus Based Sampling

Authors: J. A. Carrillo, F. Hoffmann, A. M. Stuart, U. Vaes

Abstract: We propose a novel method for sampling and optimization tasks based on a stochastic interacting particle system. We explain how this method can be used for the following two goals: (i) generating approximate samples from a given target distribution; (ii) optimizing a given objective function. The approach is derivative-free and affine invariant, and is therefore well-suited for solving inverse pro… ▽ More We propose a novel method for sampling and optimization tasks based on a stochastic interacting particle system. We explain how this method can be used for the following two goals: (i) generating approximate samples from a given target distribution; (ii) optimizing a given objective function. The approach is derivative-free and affine invariant, and is therefore well-suited for solving inverse problems defined by complex forward models: (i) allows generation of samples from the Bayesian posterior and (ii) allows determination of the maximum a posteriori estimator. We investigate the properties of the proposed family of methods in terms of various parameter choices, both analytically and by means of numerical simulations. The analysis and numerical simulation establish that the method has potential for general purpose optimization tasks over Euclidean space; contraction properties of the algorithm are established under suitable conditions, and computational experiments demonstrate wide basins of attraction for various specific problems. The analysis and experiments also demonstrate the potential for the sampling methodology in regimes in which the target distribution is unimodal and close to Gaussian; indeed we prove that the method recovers a Laplace approximation to the measure in certain parametric regimes and provide numerical evidence that this Laplace approximation attracts a large set of initial conditions in a number of examples. △ Less

Submitted 4 November, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

MSC Class: 62F15; 65C35; 65N21; 35G25

arXiv:2104.03384 [pdf, other]

Ensemble Inference Methods for Models With Noisy and Expensive Likelihoods

Authors: Oliver R. A. Dunbar, Andrew B. Duncan, Andrew M. Stuart, Marie-Therese Wolfram

Abstract: The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle syste… ▽ More The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations. This paper is concerned with the use of interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large scale parametric structure of interest. {A motivating example from climate science is presented, and ensemble Kalman methods (which do not use the derivative of the parameter-to-data map) are shown, empirically, to perform well. Multiscale analysis is then used to analyze the behaviour of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods} (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favourably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, whilst ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favourably. △ Less

Submitted 22 January, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

MSC Class: 65C05; 65C40; 60J22

arXiv:2103.12959 [pdf, other]

Solving and Learning Nonlinear PDEs with Gaussian Processes

Authors: Yifan Chen, Bamdad Hosseini, Houman Owhadi, Andrew M Stuart

Abstract: We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach: (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs; (2) has guaranteed convergence f… ▽ More We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach: (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs; (2) has guaranteed convergence for a very general class of PDEs, and comes equipped with a path to compute error bounds for specific PDE approximations; (3) inherits the state-of-the-art computational complexity of linear solvers for dense kernel matrices. The main idea of our method is to approximate the solution of a given PDE as the maximum a posteriori (MAP) estimator of a Gaussian process conditioned on solving the PDE at a finite number of collocation points. Although this optimization problem is infinite-dimensional, it can be reduced to a finite-dimensional one by introducing additional variables corresponding to the values of the derivatives of the solution at collocation points; this generalizes the representer theorem arising in Gaussian process regression. The reduced optimization problem has the form of a quadratic objective function subject to nonlinear constraints; it is solved with a variant of the Gauss--Newton method. The resulting algorithm (a) can be interpreted as solving successive linearizations of the nonlinear PDE, and (b) in practice is found to converge in a small number of iterations (2 to 10), for a wide range of PDEs. Most traditional approaches to IPs interleave parameter updates with numerical solution of the PDE; our algorithm solves for both parameter and PDE solution simultaneously. Experiments on nonlinear elliptic PDEs, Burgers' equation, a regularized Eikonal equation, and an IP for permeability identification in Darcy flow illustrate the efficacy and scope of our framework. △ Less

Submitted 10 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: 41 pages

MSC Class: 60G15; 65M75; 65N75; 65N35; 47B34; 41A15; 35R30; 34B15

arXiv:2102.01580 [pdf, other]

Iterated Kalman Methodology For Inverse Problems

Authors: Daniel Zhengyu Huang, Tapio Schneider, Andrew M. Stuart

Abstract: This paper is focused on the optimization approach to the solution of inverse problems. We introduce a stochastic dynamical system in which the parameter-to-data map is embedded, with the goal of employing techniques from nonlinear Kalman filtering to estimate the parameter given the data. The extended Kalman filter (which we refer to as ExKI in the context of inverse problems) can be effective fo… ▽ More This paper is focused on the optimization approach to the solution of inverse problems. We introduce a stochastic dynamical system in which the parameter-to-data map is embedded, with the goal of employing techniques from nonlinear Kalman filtering to estimate the parameter given the data. The extended Kalman filter (which we refer to as ExKI in the context of inverse problems) can be effective for some inverse problems approached this way, but is impractical when the forward map is not readily differentiable and is given as a black box, and also for high dimensional parameter spaces because of the need to propagate large covariance matrices. Application of ensemble Kalman filters, for example use of the ensemble Kalman inversion (EKI) algorithm, has emerged as a useful tool which overcomes both of these issues: it is derivative free and works with a low-rank covariance approximation formed from the ensemble. In this paper, we work with the ExKI, EKI, and a variant on EKI which we term unscented Kalman inversion (UKI). The paper contains two main contributions. Firstly, we identify a novel stochastic dynamical system in which the parameter-to-data map is embedded. We present theory in the linear case to show exponential convergence of the mean of the filtering distribution to the solution of a regularized least squares problem. This is in contrast to previous work in which the EKI has been employed where the dynamical system used leads to algebraic convergence to an unregularized problem. Secondly, we show that the application of the UKI to this novel stochastic dynamical system yields improved inversion results, in comparison with the application of EKI to the same novel stochastic dynamical system. △ Less

Submitted 28 April, 2022; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: 56 pages, 24 figures

arXiv:2102.00540 [pdf, other]

Derivative-free Bayesian Inversion Using Multiscale Dynamics

Authors: G. A. Pavliotis, A. M. Stuart, U. Vaes

Abstract: Inverse problems are ubiquitous because they formalize the integration of data with mathematical models. In many scientific applications the forward model is expensive to evaluate, and adjoint computations are difficult to employ; in this setting derivative-free methods which involve a small number of forward model evaluations are an attractive proposition. Ensemble Kalman based interacting partic… ▽ More Inverse problems are ubiquitous because they formalize the integration of data with mathematical models. In many scientific applications the forward model is expensive to evaluate, and adjoint computations are difficult to employ; in this setting derivative-free methods which involve a small number of forward model evaluations are an attractive proposition. Ensemble Kalman based interacting particle systems (and variants such as consensus based and unscented Kalman approaches) have proven empirically successful in this context, but suffer from the fact that they cannot be systematically refined to return the true solution, except in the setting of linear forward models. In this paper, we propose a new derivative-free approach to Bayesian inversion, which may be employed for posterior sampling or for maximum a posteriori estimation, and may be systematically refined. The method relies on a fast/slow system of stochastic differential equations for the local approximation of the gradient of the log-likelihood appearing in a Langevin diffusion. Furthermore the method may be preconditioned by use of information from ensemble Kalman based methods (and variants), providing a methodology which leverages the documented advantages of those methods, whilst also being provably refineable. We define the methodology, highlighting its flexibility and many variants, provide a theoretical analysis of the proposed approach, and demonstrate its efficacy by means of numerical experiments. △ Less

Submitted 4 November, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

MSC Class: 62F15; 65C35; 65C30; 65N21

arXiv:2012.13262 [pdf, other]

doi 10.1029/2020MS002454

Calibration and Uncertainty Quantification of Convective Parameters in an Idealized GCM

Authors: Oliver R. A. Dunbar, Alfredo Garbuno-Inigo, Tapio Schneider, Andrew M. Stuart

Abstract: Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes both optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Mark… ▽ More Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes both optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Markov chain Monte Carlo (MCMC) methods typically require $O(10^5)$ model runs and are sensitive to internal variability noise, rendering them infeasible for climate models. Here we demonstrate an approach to model calibration and uncertainty quantification that requires only $O(10^2)$ model runs and can accommodate internal climate variability. The approach consists of three stages: (i) a calibration stage uses variants of ensemble Kalman inversion to calibrate a model by minimizing mismatches between model and data statistics; (ii) an emulation stage emulates the parameter-to-data map with Gaussian processes (GP), using the model runs in the calibration stage for training; (iii) a sampling stage approximates the Bayesian posterior distributions by sampling the GP emulator with MCMC. We demonstrate the feasibility and computational efficiency of this calibrate-emulate-sample (CES) approach in a perfect-model setting. Using an idealized general circulation model, we estimate parameters in a simple convection scheme from synthetic data generated with the model. The CES approach generates probability distributions of the parameters that are good approximations of the Bayesian posteriors, at a fraction of the computational cost usually required to obtain them. Sampling from this approximate posterior allows the generation of climate predictions with quantified parametric uncertainties. △ Less

Submitted 19 August, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

arXiv:2009.13457 [pdf, other]

Drift Estimation of Multiscale Diffusions Based on Filtered Data

Authors: Assyr Abdulle, Giacomo Garegnani, Grigorios A. Pavliotis, Andrew M. Stuart, Andrea Zanoni

Abstract: We study the problem of drift estimation for two-scale continuous time series. We set ourselves in the framework of overdamped Langevin equations, for which a single-scale surrogate homogenized equation exists. In this setting, estimating the drift coefficient of the homogenized equation requires pre-processing of the data, often in the form of subsampling; this is because the two-scale equation a… ▽ More We study the problem of drift estimation for two-scale continuous time series. We set ourselves in the framework of overdamped Langevin equations, for which a single-scale surrogate homogenized equation exists. In this setting, estimating the drift coefficient of the homogenized equation requires pre-processing of the data, often in the form of subsampling; this is because the two-scale equation and the homogenized single-scale equation are incompatible at small scales, generating mutually singular measures on the path space. We avoid subsampling and work instead with filtered data, found by application of an appropriate kernel function, and compute maximum likelihood estimators based on the filtered process. We show that the estimators we propose are asymptotically unbiased and demonstrate numerically the advantages of our method with respect to subsampling. Finally, we show how our filtered data methodology can be combined with Bayesian techniques and provide a full uncertainty quantification of the inference procedure. △ Less

Submitted 6 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

arXiv:2007.12809 [pdf, other]

doi 10.1088/1361-6420/ac1e80

Posterior Consistency of Semi-Supervised Regression on Graphs

Authors: Andrea L. Bertozzi, Bamdad Hosseini, Hao Li, Kevin Miller, Andrew M. Stuart

Abstract: Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present… ▽ More Graph-based semi-supervised regression (SSR) is the problem of estimating the value of a function on a weighted graph from its values (labels) on a small subset of the vertices. This paper is concerned with the consistency of SSR in the context of classification, in the setting where the labels have small noise and the underlying graph weighting is consistent with well-clustered nodes. We present a Bayesian formulation of SSR in which the weighted graph defines a Gaussian prior, using a graph Laplacian, and the labeled data defines a likelihood. We analyze the rate of contraction of the posterior measure around the ground truth in terms of parameters that quantify the small label error and inherent clustering in the graph. We obtain bounds on the rates of contraction and illustrate their sharpness through numerical experiments. The analysis also gives insight into the choice of hyperparameters that enter the definition of the prior. △ Less

Submitted 24 March, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

arXiv:2007.06175 [pdf, other]

Ensemble Kalman Inversion for Sparse Learning of Dynamical Systems from Time-Averaged Data

Authors: Tapio Schneider, Andrew M. Stuart, Jin-Long Wu

Abstract: Enforcing sparse structure within learning has led to significant advances in the field of data-driven discovery of dynamical systems. However, such methods require access not only to time-series of the state of the dynamical system, but also to the time derivative. In many applications, the data are available only in the form of time-averages such as moments and autocorrelation functions. We prop… ▽ More Enforcing sparse structure within learning has led to significant advances in the field of data-driven discovery of dynamical systems. However, such methods require access not only to time-series of the state of the dynamical system, but also to the time derivative. In many applications, the data are available only in the form of time-averages such as moments and autocorrelation functions. We propose a sparse learning methodology to discover the vector fields defining a (possibly stochastic or partial) differential equation, using only time-averaged statistics. Such a formulation of sparse learning naturally leads to a nonlinear inverse problem to which we apply the methodology of ensemble Kalman inversion (EKI). EKI is chosen because it may be formulated in terms of the iterative solution of quadratic optimization problems; sparsity is then easily imposed. We then apply the EKI-based sparse learning methodology to various examples governed by stochastic differential equations (a noisy Lorenz 63 system), ordinary differential equations (Lorenz 96 system and coalescence equations), and a partial differential equation (the Kuramoto-Sivashinsky equation). The results demonstrate that time-averaged statistics can be used for data-driven discovery of differential equations using sparse EKI. The proposed sparse learning methodology extends the scope of data-driven discovery of differential equations to previously challenging applications and data-acquisition scenarios. △ Less

Submitted 20 October, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

Comments: 51 pages, 30 figures

arXiv:2005.11375 [pdf, other]

Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation

Authors: Yifan Chen, Houman Owhadi, Andrew M. Stuart

Abstract: Gaussian process regression has proven very powerful in statistics, machine learning and inverse problems. A crucial aspect of the success of this methodology, in a wide range of applications to complex and real-world problems, is hierarchical modeling and learning of hyperparameters. The purpose of this paper is to study two paradigms of learning hierarchical parameters: one is from the probabili… ▽ More Gaussian process regression has proven very powerful in statistics, machine learning and inverse problems. A crucial aspect of the success of this methodology, in a wide range of applications to complex and real-world problems, is hierarchical modeling and learning of hyperparameters. The purpose of this paper is to study two paradigms of learning hierarchical parameters: one is from the probabilistic Bayesian perspective, in particular, the empirical Bayes approach that has been largely used in Bayesian statistics; the other is from the deterministic and approximation theoretic view, and in particular the kernel flow algorithm that was proposed recently in the machine learning literature. Analysis of their consistency in the large data limit, as well as explicit identification of their implicit bias in parameter learning, are established in this paper for a Matérn-like model on the torus. A particular technical challenge we overcome is the learning of the regularity parameter in the Matérn-like field, for which consistency results have been very scarce in the spatial statistics literature. Moreover, we conduct extensive numerical experiments beyond the Matérn-like model, comparing the two algorithms further. These experiments demonstrate learning of other hierarchical parameters, such as amplitude and lengthscale; they also illustrate the setting of model misspecification in which the kernel flow approach could show superior performance to the more traditional empirical Bayes approach. △ Less

Submitted 16 March, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: to appear in Mathematics of Computation

MSC Class: 65F12 62C10 41A05 35Q62

arXiv:2005.10224 [pdf, other]

doi 10.1137/20M133957X

The Random Feature Model for Input-Output Maps between Banach Spaces

Authors: Nicholas H. Nelsen, Andrew M. Stuart

Abstract: Well known to the machine learning community, the random feature model is a parametric approximation to kernel interpolation or regression methods. It is typically used to approximate functions mapping a finite-dimensional input space to the real line. In this paper, we instead propose a methodology for use of the random feature model as a data-driven surrogate for operators that map an input Bana… ▽ More Well known to the machine learning community, the random feature model is a parametric approximation to kernel interpolation or regression methods. It is typically used to approximate functions mapping a finite-dimensional input space to the real line. In this paper, we instead propose a methodology for use of the random feature model as a data-driven surrogate for operators that map an input Banach space to an output Banach space. Although the methodology is quite general, we consider operators defined by partial differential equations (PDEs); here, the inputs and outputs are themselves functions, with the input parameters being functions required to specify the problem, such as initial data or coefficients, and the outputs being solutions of the problem. Upon discretization, the model inherits several desirable attributes from this infinite-dimensional viewpoint, including mesh-invariant approximation error with respect to the true PDE solution map and the capability to be trained at one mesh resolution and then deployed at different mesh resolutions. We view the random feature model as a non-intrusive data-driven emulator, provide a mathematical framework for its interpretation, and demonstrate its ability to efficiently and accurately approximate the nonlinear parameter-to-solution maps of two prototypical PDEs arising in physical science and engineering applications: viscous Burgers' equation and a variable coefficient elliptic equation. △ Less

Submitted 5 June, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: To appear in SIAM Journal on Scientific Computing; 32 pages, 9 figures

MSC Class: 65D15; 65D40; 62M45; 35R60

Journal ref: SIAM J. Sci. Comput. Vol. 43 No. 5 (2021) pp. A3212-A3243

arXiv:2005.03180 [pdf, other]

Model Reduction and Neural Networks for Parametric PDEs

Authors: Kaushik Bhattacharya, Bamdad Hosseini, Nikola B. Kovachki, Andrew M. Stuart

Abstract: We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practi… ▽ More We develop a general framework for data-driven approximation of input-output maps between infinite-dimensional spaces. The proposed approach is motivated by the recent successes of neural networks and deep learning, in combination with ideas from model reduction. This combination results in a neural network approximation which, in principle, is defined on infinite-dimensional spaces and, in practice, is robust to the dimension of finite-dimensional approximations of these spaces required for computation. For a class of input-output maps, and suitably chosen probability measures on the inputs, we prove convergence of the proposed approximation methodology. We also include numerical experiments which demonstrate the effectiveness of the method, showing convergence and robustness of the approximation scheme with respect to the size of the discretization, and compare it with existing algorithms from the literature; our examples include the mapping from coefficient to solution in a divergence form elliptic partial differential equation (PDE) problem, and the solution operator for viscous Burgers' equation. △ Less

Submitted 17 June, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 39 pages, 13 figures

MSC Class: 65N75; 62M45; 68T05; 60H30; 60H15

arXiv:2004.08376 [pdf, other]

Learning Stochastic Closures Using Ensemble Kalman Inversion

Authors: Tapio Schneider, Andrew M. Stuart, Jin-Long Wu

Abstract: Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions they describe. Therefore researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally… ▽ More Although the governing equations of many systems, when derived from first principles, may be viewed as known, it is often too expensive to numerically simulate all the interactions they describe. Therefore researchers often seek simpler descriptions that describe complex phenomena without numerically resolving all the interacting components. Stochastic differential equations (SDEs) arise naturally as models in this context. The growth in data acquisition, both through experiment and through simulations, provides an opportunity for the systematic derivation of SDE models in many disciplines. However, inconsistencies between SDEs and real data at short time scales often cause problems, when standard statistical methodology is applied to parameter estimation. The incompatibility between SDEs and real data can be addressed by deriving sufficient statistics from the time-series data and learning parameters of SDEs based on these. Following this approach, we formulate the fitting of SDEs to sufficient statistics from real data as an inverse problem and demonstrate that this inverse problem can be solved by using ensemble Kalman inversion (EKI). Furthermore, we create a framework for non-parametric learning of drift and diffusion terms by introducing hierarchical, refinable parameterizations of unknown functions, using Gaussian process regression. We demonstrate the proposed methodology for the fitting of SDE models, first in a simulation study with a noisy Lorenz '63 model, and then in other applications, including dimension reduction in deterministic chaotic systems arising in the atmospheric sciences, large-scale pattern modeling in climate dynamics, and simplified models for key observables arising in molecular dynamics. The results confirm that the proposed methodology provides a robust and systematic approach to fitting SDE models to real data. △ Less

Submitted 30 April, 2021; v1 submitted 17 April, 2020; originally announced April 2020.

Comments: 35 pages, 26 figures

arXiv:2001.03689 [pdf, other]

doi 10.1016/j.jcp.2020.109716

Calibrate, Emulate, Sample

Authors: Emmet Cleary, Alfredo Garbuno-Inigo, Shiwei Lan, Tapio Schneider, Andrew M Stuart

Abstract: Many parameter estimation problems arising in applications are best cast in the framework of Bayesian inversion. This allows not only for an estimate of the parameters, but also for the quantification of uncertainties in the estimates. Often in such problems the parameter-to-data map is very expensive to evaluate, and computing derivatives of the map, or derivative-adjoints, may not be feasible. A… ▽ More Many parameter estimation problems arising in applications are best cast in the framework of Bayesian inversion. This allows not only for an estimate of the parameters, but also for the quantification of uncertainties in the estimates. Often in such problems the parameter-to-data map is very expensive to evaluate, and computing derivatives of the map, or derivative-adjoints, may not be feasible. Additionally, in many applications only noisy evaluations of the map may be available. We propose an approach to Bayesian inversion in such settings that builds on the derivative-free optimization capabilities of ensemble Kalman inversion methods. The overarching approach is to first use ensemble Kalman sampling (EKS) to calibrate the unknown parameters to fit the data; second, to use the output of the EKS to emulate the parameter-to-data map; third, to sample from an approximate Bayesian posterior distribution in which the parameter-to-data map is replaced by its emulator. This results in a principled approach to approximate Bayesian inference that requires only a small number of evaluations of the (possibly noisy approximation of the) parameter-to-data map. It does not require derivatives of this map, but instead leverages the documented power of ensemble Kalman methods. Furthermore, the EKS has the desirable property that it evolves the parameter ensembles towards the regions in which the bulk of the parameter posterior mass is located, thereby locating them well for the emulation phase of the methodology. In essence, the EKS methodology provides a cheap solution to the design problem of where to place points in parameter space to efficiently train an emulator of the parameter-to-data map for the purposes of Bayesian inversion. △ Less

Submitted 10 January, 2020; originally announced January 2020.

arXiv:1910.14193 [pdf, other]

A Simple Modeling Framework For Prediction In The Human Glucose-Insulin System

Authors: M. Sirlanci, M. E. Levine, C. C. Low Wang, D. J. Albers, A. M. Stuart

Abstract: In this paper, we build a new, simple, and interpretable mathematical model to estimate and forecast physiology related to the human glucose-insulin system, constrained by available data. By constructing a simple yet flexible model class with interpretable parameters, this general model can be specialized to work in different settings, such as type 2 diabetes mellitus (T2DM) and intensive care uni… ▽ More In this paper, we build a new, simple, and interpretable mathematical model to estimate and forecast physiology related to the human glucose-insulin system, constrained by available data. By constructing a simple yet flexible model class with interpretable parameters, this general model can be specialized to work in different settings, such as type 2 diabetes mellitus (T2DM) and intensive care unit (ICU); different choices of appropriate model functions describing uptake of nutrition and removal of glucose differentiate between the models. In both cases, the available data is sparse and collected in clinical settings, major factors that have constrained our model choice to the simple form adopted. The model has the form of a linear stochastic differential equation (SDE) to describe the evolution of the BG level. The model includes a term quantifying glucose removal from the bloodstream through the regulation system of the human body and two other terms representing the effect of nutrition and externally delivered insulin. The stochastic fluctuations encapsulate model error necessitated by the simple model form and enable flexible incorporation of data. The model parameters must be learned in a patient-specific fashion, leading to personalized models. We present experimental results on patient-specific parameter estimation and future BG level forecasting in T2DM and ICU settings. The resulting model leads to the prediction of the BG level as an expected value accompanied by a band around this value which accounts for uncertainties in the prediction. Such predictions, then, have the potential for use as part of control systems that are robust to model imperfections and noisy data. Finally, the model's predictive capability is compared with two different models built explicitly for T2DM and ICU contexts. △ Less

Submitted 20 September, 2022; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: 41 pages, 8 figures, 4 tables

MSC Class: 92

arXiv:1909.06389 [pdf, other]

Spectral Analysis Of Weighted Laplacians Arising In Data Clustering

Authors: Franca Hoffmann, Bamdad Hosseini, Assad A. Oberai, Andrew M. Stuart

Abstract: Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, ther… ▽ More Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, therefore, sheds light on learning algorithms. This paper is devoted to the study of a parameterized family of divergence form elliptic operators that arise as the large data limit of graph Laplacians. The link between a three-parameter family of graph Laplacians and a three-parameter family of differential operators is explained. The spectral properties of these differential operators are analyzed in the situation where the data comprises two nearly separated clusters, in a sense which is made precise. In particular, we investigate how the spectral gap depends on the three parameters entering the graph Laplacian, and on a parameter measuring the size of the perturbation from the perfectly clustered case. Numerical results are presented which exemplify and extend the analysis: the computations study situations in which there are two nearly separated clusters, but which violate the assumptions used in our theory; situations in which more than two clusters are present, also going beyond our theory; and situations which demonstrate the relevance of our studies of differential operators for the understanding of finite data problems via the graph Laplacian. The findings provide insight into parameter choices made in learning algorithms which are based on weighted adjacency matrices; they also provide the basis for analysis of the consistency of various unsupervised and semi-supervised learning algorithms, in the large data limit. △ Less

Submitted 13 July, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

MSC Class: 47A75; 62H30; 68T10; 35B20; 05C50

arXiv:1906.07658 [pdf, other]

Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methods

Authors: Franca Hoffmann, Bamdad Hosseini, Zhi Ren, Andrew M. Stuart

Abstract: Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classific… ▽ More Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classification, and a natural generalization of this method to multi-class classification using one-hot encoding. The resulting objective function to be optimized comprises the sum of a quadratic form defined through a rational function of the graph Laplacian, involving only the unlabelled data, and a fidelity term involving only the labelled data. The consistency analysis sheds light on the choice of the rational function defining the optimization. △ Less

Submitted 9 March, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

MSC Class: 62H30; 68T10; 68Q87; 91C20

Showing 1–50 of 131 results for author: Stuart, A M