-
Real-time Bayesian inference at extreme scale: A digital twin for tsunami early warning applied to the Cascadia subduction zone
Authors:
Stefan Henneking,
Sreeram Venkat,
Veselin Dobrev,
John Camier,
Tzanio Kolev,
Milinda Fernando,
Alice-Agnes Gabriel,
Omar Ghattas
Abstract:
We present a Bayesian inversion-based digital twin that employs acoustic pressure data from seafloor sensors, along with 3D coupled acoustic-gravity wave equations, to infer earthquake-induced spatiotemporal seafloor motion in real time and forecast tsunami propagation toward coastlines for early warning with quantified uncertainties. Our target is the Cascadia subduction zone, with one billion pa…
▽ More
We present a Bayesian inversion-based digital twin that employs acoustic pressure data from seafloor sensors, along with 3D coupled acoustic-gravity wave equations, to infer earthquake-induced spatiotemporal seafloor motion in real time and forecast tsunami propagation toward coastlines for early warning with quantified uncertainties. Our target is the Cascadia subduction zone, with one billion parameters. Computing the posterior mean alone would require 50 years on a 512 GPU machine. Instead, exploiting the shift invariance of the parameter-to-observable map and devising novel parallel algorithms, we induce a fast offline-online decomposition. The offline component requires just one adjoint wave propagation per sensor; using MFEM, we scale this part of the computation to the full El Capitan system (43,520 GPUs) with 92% weak parallel efficiency. Moreover, given real-time data, the online component exactly solves the Bayesian inverse and forecasting problems in 0.2 seconds on a modest GPU system, a ten-billion-fold speedup.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Dimension reduction for derivative-informed operator learning: An analysis of approximation errors
Authors:
Dingcheng Luo,
Thomas O'Leary-Roseberry,
Peng Chen,
Omar Ghattas
Abstract:
We study the derivative-informed learning of nonlinear operators between infinite-dimensional separable Hilbert spaces by neural networks. Such operators can arise from the solution of partial differential equations (PDEs), and are used in many simulation-based outer-loop tasks in science and engineering, such as PDE-constrained optimization, Bayesian inverse problems, and optimal experimental des…
▽ More
We study the derivative-informed learning of nonlinear operators between infinite-dimensional separable Hilbert spaces by neural networks. Such operators can arise from the solution of partial differential equations (PDEs), and are used in many simulation-based outer-loop tasks in science and engineering, such as PDE-constrained optimization, Bayesian inverse problems, and optimal experimental design. In these settings, the neural network approximations can be used as surrogate models to accelerate the solution of the outer-loop tasks. However, since outer-loop tasks in infinite dimensions often require knowledge of the underlying geometry, the approximation accuracy of the operator's derivatives can also significantly impact the performance of the surrogate model. Motivated by this, we analyze the approximation errors of neural operators in Sobolev norms over infinite-dimensional Gaussian input measures. We focus on the reduced basis neural operator (RBNO), which uses linear encoders and decoders defined on dominant input/output subspaces spanned by reduced sets of orthonormal bases. To this end, we study two methods for generating the bases; principal component analysis (PCA) and derivative-informed subspaces (DIS), which use the dominant eigenvectors of the covariance of the data or the derivatives as the reduced bases, respectively. We then derive bounds for errors arising from both the dimension reduction and the latent neural network approximation, including the sampling errors associated with the empirical estimation of the PCA/DIS. Our analysis is validated on numerical experiments with elliptic PDEs, where our results show that bases informed by the map (i.e., DIS or output PCA) yield accurate reconstructions and generalization errors for both the operator and its derivatives, while input PCA may underperform unless ranks and training sample sizes are sufficiently large.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Goal-Oriented Real-Time Bayesian Inference for Linear Autonomous Dynamical Systems With Application to Digital Twins for Tsunami Early Warning
Authors:
Stefan Henneking,
Sreeram Venkat,
Omar Ghattas
Abstract:
We present a goal-oriented framework for constructing digital twins with the following properties: (1) they employ discretizations of high-fidelity PDE models governed by autonomous dynamical systems, leading to large-scale forward problems; (2) they solve a linear inverse problem to assimilate observational data to infer uncertain model components followed by a forward prediction of the evolving…
▽ More
We present a goal-oriented framework for constructing digital twins with the following properties: (1) they employ discretizations of high-fidelity PDE models governed by autonomous dynamical systems, leading to large-scale forward problems; (2) they solve a linear inverse problem to assimilate observational data to infer uncertain model components followed by a forward prediction of the evolving dynamics; and (3) the entire end-to-end, data-to-inference-to-prediction computation is carried out in real time through a Bayesian framework that rigorously accounts for uncertainties. Realizations of such a framework are faced with several challenges that stem from the extreme scale of forward models and, in some cases, slow eigenvalue decay of the parameter-to-observable map. In this paper, we introduce a methodology to overcome these challenges by exploiting the autonomous structure of the forward model. As a result, we can move the PDE solutions, which dominate the cost for solving the Bayesian inverse problem, to an offline computation and leverage the high-performance dense linear algebra capabilities of GPUs to accelerate the online prediction of quantities of interest. We seek to apply this framework to construct digital twins for the Cascadia subduction zone as a means of providing early warning for tsunamis generated by subduction zone megathrust earthquakes. To that end, we demonstrate how our methodology can be used to employ seafloor pressure observations, along with the coupled acoustic-gravity wave equations, to infer the earthquake-induced seafloor motion (discretized with $O(10^9)$ parameters) and forward predict the tsunami propagation. We present results of an end-to-end inference, prediction, and uncertainty quantification for a representative test problem for which this goal-oriented Bayesian inference is accomplished in real time, that is, in a matter of seconds.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transport
Authors:
Lianghao Cao,
Joshua Chen,
Michael Brennan,
Thomas O'Leary-Roseberry,
Youssef Marzouk,
Omar Ghattas
Abstract:
We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. Du…
▽ More
We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. During the online phase, when given observational data, we seek rapid posterior approximation using surrogate-driven training of a lazy map [Brennan et al., NeurIPS, (2020)], i.e., a structure-exploiting transport map with low-dimensional nonlinearity. The trained lazy map then produces approximate posterior samples or density evaluations. Our surrogate construction is optimized for amortized Bayesian inversion using lazy map variational inference. We show that (i) the derivative-based reduced basis architecture [O'Leary-Roseberry et al., Comput. Methods Appl. Mech. Eng., 388 (2022)] minimizes the upper bound on the expected error in surrogate posterior approximation, and (ii) the derivative-informed training formulation [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] minimizes the expected error due to surrogate-driven transport map optimization. Our numerical results demonstrate that LazyDINO is highly efficient in cost amortization for Bayesian inversion. We observe one to two orders of magnitude reduction of offline cost for accurate posterior approximation, compared to simulation-based amortized inference via conditional transport and conventional surrogate-driven transport. In particular, LazyDINO outperforms Laplace approximation consistently using fewer than 1000 offline samples, while other amortized inference methods struggle and sometimes fail at 16,000 offline samples.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Real-time aerodynamic load estimation for hypersonics via strain-based inverse maps
Authors:
Julie Pham,
Omar Ghattas,
Noel Clemens,
Karen Willcox
Abstract:
This work develops an efficient real-time inverse formulation for inferring the aerodynamic surface pressures on a hypersonic vehicle from sparse measurements of the structural strain. The approach aims to provide real-time estimates of the aerodynamic loads acting on the vehicle for ground and flight testing, as well as guidance, navigation, and control applications. Specifically, the approach ta…
▽ More
This work develops an efficient real-time inverse formulation for inferring the aerodynamic surface pressures on a hypersonic vehicle from sparse measurements of the structural strain. The approach aims to provide real-time estimates of the aerodynamic loads acting on the vehicle for ground and flight testing, as well as guidance, navigation, and control applications. Specifically, the approach targets hypersonic flight conditions where direct measurement of the surface pressures is challenging due to the harsh aerothermal environment. For problems employing a linear elastic structural model, we show that the inference problem can be posed as a least-squares problem with a linear constraint arising from a finite element discretization of the governing elasticity partial differential equation. Due to the linearity of the problem, an explicit solution is given by the normal equations. Pre-computation of the resulting inverse map enables rapid evaluation of the surface pressure and corresponding integrated quantities, such as the force and moment coefficients. The inverse approach additionally allows for uncertainty quantification, providing insights for theoretical recoverability and robustness to sensor noise. Numerical studies demonstrate the estimator performance for reconstructing the surface pressure field, as well as the force and moment coefficients, for the Initial Concept 3.X (IC3X) conceptual hypersonic vehicle.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Inference of Heterogeneous Material Properties via Infinite-Dimensional Integrated DIC
Authors:
Joseph Kirchhoff,
Dingcheng Luo,
Thomas O'Leary-Roseberry,
Omar Ghattas
Abstract:
We present a scalable and efficient framework for the inference of spatially-varying parameters of continuum materials from image observations of their deformations. Our goal is the nondestructive identification of arbitrary damage, defects, anomalies and inclusions without knowledge of their morphology or strength. Since these effects cannot be directly observed, we pose their identification as a…
▽ More
We present a scalable and efficient framework for the inference of spatially-varying parameters of continuum materials from image observations of their deformations. Our goal is the nondestructive identification of arbitrary damage, defects, anomalies and inclusions without knowledge of their morphology or strength. Since these effects cannot be directly observed, we pose their identification as an inverse problem. Our approach builds on integrated digital image correlation (IDIC, Besnard Hild, Roux, 2006), which poses the image registration and material inference as a monolithic inverse problem, thereby enforcing physical consistency of the image registration using the governing PDE. Existing work on IDIC has focused on low-dimensional parameterizations of materials. In order to accommodate the inference of heterogeneous material propertes that are formally infinite dimensional, we present $\infty$-IDIC, a general formulation of the PDE-constrained coupled image registration and inversion posed directly in the function space setting. This leads to several mathematical and algorithmic challenges arising from the ill-posedness and high dimensionality of the inverse problem. To address ill-posedness, we consider various regularization schemes, namely $H^1$ and total variation for the inference of smooth and sharp features, respectively. To address the computational costs associated with the discretized problem, we use an efficient inexact-Newton CG framework for solving the regularized inverse problem. In numerical experiments, we demonstrate the ability of $\infty$-IDIC to characterize complex, spatially varying Lamé parameter fields of linear elastic and hyperelastic materials. Our method exhibits (i) the ability to recover fine-scale and sharp material features, (ii) mesh-independent convergence performance and hyperparameter selection, (iii) robustness to observational noise.
△ Less
Submitted 22 July, 2024;
originally announced August 2024.
-
Gaussian mixture Taylor approximations of risk measures constrained by PDEs with Gaussian random field inputs
Authors:
Dingcheng Luo,
Joshua Chen,
Peng Chen,
Omar Ghattas
Abstract:
This work considers the computation of risk measures for quantities of interest governed by PDEs with Gaussian random field parameters using Taylor approximations. While efficient, Taylor approximations are local to the point of expansion, and hence may degrade in accuracy when the variances of the input parameters are large. To address this challenge, we approximate the underlying Gaussian measur…
▽ More
This work considers the computation of risk measures for quantities of interest governed by PDEs with Gaussian random field parameters using Taylor approximations. While efficient, Taylor approximations are local to the point of expansion, and hence may degrade in accuracy when the variances of the input parameters are large. To address this challenge, we approximate the underlying Gaussian measure by a mixture of Gaussians with reduced variance in a dominant direction of parameter space. Taylor approximations are constructed at the means of each Gaussian mixture component, which are then combined to approximate the risk measures. The formulation is presented in the setting of infinite-dimensional Gaussian random parameters for risk measures including the mean, variance, and conditional value-at-risk. We also provide detailed analysis of the approximations errors arising from two sources: the Gaussian mixture approximation and the Taylor approximations. Numerical experiments are conducted for a semilinear advection-diffusion-reaction equation with a random diffusion coefficient field and for the Helmholtz equation with a random wave speed field. For these examples, the proposed approximation strategy can achieve less than $1\%$ relative error in estimating CVaR with only $\mathcal{O}(10)$ state PDE solves, which is comparable to a standard Monte Carlo estimate with $\mathcal{O}(10^4)$ samples, thus achieving significant reduction in computational cost. The proposed method can therefore serve as a way to rapidly and accurately estimate risk measures under limited computational budgets.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Fast and Scalable FFT-Based GPU-Accelerated Algorithms for Hessian Actions Arising in Linear Inverse Problems Governed by Autonomous Dynamical Systems
Authors:
Sreeram Venkat,
Milinda Fernando,
Stefan Henneking,
Omar Ghattas
Abstract:
We present an efficient and scalable algorithm for performing matrix-vector multiplications ("matvecs") for block Toeplitz matrices. Such matrices, which are shift-invariant with respect to their blocks, arise in the context of solving inverse problems governed by autonomous systems, and time-invariant systems in particular. In this article, we consider inverse problems that are solved for inferri…
▽ More
We present an efficient and scalable algorithm for performing matrix-vector multiplications ("matvecs") for block Toeplitz matrices. Such matrices, which are shift-invariant with respect to their blocks, arise in the context of solving inverse problems governed by autonomous systems, and time-invariant systems in particular. In this article, we consider inverse problems that are solved for inferring unknown parameters from observational data of a linear time-invariant dynamical system given in the form of partial differential equations (PDEs). Matrix-free Newton-conjugate-gradient methods are often the gold standard for solving these inverse problems, but they require numerous actions of the Hessian on a vector. Matrix-free adjoint-based Hessian matvecs require solution of a pair of linearized forward/adjoint PDE solves per Hessian action, which may be prohibitive for large-scale inverse problems, especially when efficient low-rank approximations of the Hessian are not readily available, such as for hyperbolic PDE operators. Time invariance of the forward PDE problem leads to a block Toeplitz structure of the discretized parameter-to-observable (p2o) map defining the mapping from inputs (parameters) to outputs (observables) of the PDEs. This block Toeplitz structure enables us to exploit two key properties: (1) compact storage of the p2o map and its adjoint; and (2) efficient fast Fourier transform (FFT)-based Hessian matvecs. The proposed algorithm is mapped onto large multi-GPU clusters and achieves more than 80 percent of peak bandwidth on an NVIDIA A100 GPU. Excellent weak scaling is shown for up to 48 A100 GPUs. For the targeted problems, the implementation executes Hessian matvecs within fractions of a second, orders of magnitude faster than can be achieved by the conventional matrix-free Hessian matvecs via forward/adjoint PDE solves.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems
Authors:
Lianghao Cao,
Thomas O'Leary-Roseberry,
Omar Ghattas
Abstract:
We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (P…
▽ More
We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.
△ Less
Submitted 20 May, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Ensemble Kalman Filters with Resampling
Authors:
Omar Al Ghattas,
Jiajun Bao,
Daniel Sanz-Alonso
Abstract:
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical su…
▽ More
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state of the system is high dimensional, ensemble Kalman filters are often the method of choice. These algorithms rely on an ensemble of interacting particles to sequentially estimate the state as new observations become available. Despite the practical success of ensemble Kalman filters, theoretical understanding is hindered by the intricate dependence structure of the interacting particles. This paper investigates ensemble Kalman filters that incorporate an additional resampling step to break the dependency between particles. The new algorithm is amenable to a theoretical analysis that extends and improves upon those available for filters without resampling, while also performing well in numerical examples.
△ Less
Submitted 27 July, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Point spread function approximation of high rank Hessians with locally supported non-negative integral kernels
Authors:
Nick Alger,
Tucker Hartland,
Noemi Petra,
Omar Ghattas
Abstract:
We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which…
▽ More
We present an efficient matrix-free point spread function (PSF) method for approximating operators that have locally supported non-negative integral kernels. The method computes impulse responses at scattered points, and interpolates these impulse responses to approximate integral kernel entries. Impulse responses are computed by applying the operator to Dirac comb batches of point sources, which are chosen via an ellipsoid packing procedure. Evaluation of kernel entries allows us to construct a hierarchical matrix approximation of the operator, which is used for further matrix computations. We illustrate the end-to-end method on a blur problem, then use the method to build preconditioners for the Hessian in two inverse problems governed by partial differential equations (PDEs): inversion for the basal friction coefficient in an ice sheet flow problem and for the initial condition in an advective-diffusive transport problem. While for many ill-posed inverse problems the Hessian of the data misfit term exhibits a low rank structure, and hence a low rank approximation is suitable, for many problems of practical interest the numerical rank of the Hessian is still large. But Hessian impulse responses typically become more local as the numerical rank increases, which benefits the PSF method. Numerical results reveal that the PSF preconditioner clusters the spectrum of the preconditioned Hessian near one, yielding roughly 5x-10x reductions in the required number of PDE solves, as compared to regularization preconditioning and no preconditioning. We also present a numerical study for the influence of various parameters (that control the shape of the impulse responses) on the effectiveness of the advection-diffusion Hessian approximation. The results show that the PSF-based preconditioners are able to form good approximations of high-rank Hessians using a small number of operator applications.
△ Less
Submitted 22 February, 2024; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Bayesian model calibration for diblock copolymer thin film self-assembly using power spectrum of microscopy data and machine learning surrogate
Authors:
Lianghao Cao,
Keyi Wu,
J. Tinsley Oden,
Peng Chen,
Omar Ghattas
Abstract:
Identifying parameters of computational models from experimental data, or model calibration, is fundamental for assessing and improving the predictability and reliability of computer simulations. In this work, we propose a method for Bayesian calibration of models that predict morphological patterns of diblock copolymer (Di-BCP) thin film self-assembly while accounting for various sources of uncer…
▽ More
Identifying parameters of computational models from experimental data, or model calibration, is fundamental for assessing and improving the predictability and reliability of computer simulations. In this work, we propose a method for Bayesian calibration of models that predict morphological patterns of diblock copolymer (Di-BCP) thin film self-assembly while accounting for various sources of uncertainties in pattern formation and data acquisition. This method extracts the azimuthally-averaged power spectrum (AAPS) of the top-down microscopy characterization of Di-BCP thin film patterns as summary statistics for Bayesian inference of model parameters via the pseudo-marginal method. We derive the analytical and approximate form of a conditional likelihood for the AAPS of image data. We demonstrate that AAPS-based image data reduction retains the mutual information, particularly on important length scales, between image data and model parameters while being relatively agnostic to the aleatoric uncertainties associated with the random long-range disorder of Di-BCP patterns. Additionally, we propose a phase-informed prior distribution for Bayesian model calibration. Furthermore, reducing image data to AAPS enables us to efficiently build surrogate models to accelerate the proposed Bayesian model calibration procedure. We present the formulation and training of two multi-layer perceptrons for approximating the parameter-to-spectrum map, which enables fast integrated likelihood evaluations. We validate the proposed Bayesian model calibration method through numerical examples, for which the neural network surrogate delivers a fivefold reduction of the number of model simulations performed for a single calibration task.
△ Less
Submitted 3 August, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operators
Authors:
Dingcheng Luo,
Thomas O'Leary-Roseberry,
Peng Chen,
Omar Ghattas
Abstract:
We propose a novel machine learning framework for solving optimization problems governed by large-scale partial differential equations (PDEs) with high-dimensional random parameters. Such optimization under uncertainty (OUU) problems may be computational prohibitive using classical methods, particularly when a large number of samples is needed to evaluate risk measures at every iteration of an opt…
▽ More
We propose a novel machine learning framework for solving optimization problems governed by large-scale partial differential equations (PDEs) with high-dimensional random parameters. Such optimization under uncertainty (OUU) problems may be computational prohibitive using classical methods, particularly when a large number of samples is needed to evaluate risk measures at every iteration of an optimization algorithm, where each sample requires the solution of an expensive-to-solve PDE. To address this challenge, we propose a new neural operator approximation of the PDE solution operator that has the combined merits of (1) accurate approximation of not only the map from the joint inputs of random parameters and optimization variables to the PDE state, but also its derivative with respect to the optimization variables, (2) efficient construction of the neural network using reduced basis architectures that are scalable to high-dimensional OUU problems, and (3) requiring only a limited number of training data to achieve high accuracy for both the PDE solution and the OUU solution. We refer to such neural operators as multi-input reduced basis derivative informed neural operators (MR-DINOs). We demonstrate the accuracy and efficiency our approach through several numerical experiments, i.e. the risk-averse control of a semilinear elliptic PDE and the steady state Navier--Stokes equations in two and three spatial dimensions, each involving random field inputs. Across the examples, MR-DINOs offer $10^{3}$--$10^{7} \times$ reductions in execution time, and are able to produce OUU solutions of comparable accuracies to those from standard PDE based solutions while being over $10 \times$ more cost-efficient after factoring in the cost of construction.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problems
Authors:
Lianghao Cao,
Thomas O'Leary-Roseberry,
Prashant K. Jha,
J. Tinsley Oden,
Omar Ghattas
Abstract:
We explore using neural operators, or neural network representations of nonlinear maps between function spaces, to accelerate infinite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlinear parametric partial differential equations (PDEs). Neural operators have gained significant attention in recent years for their ability to approximate the parameter-to-solution maps defin…
▽ More
We explore using neural operators, or neural network representations of nonlinear maps between function spaces, to accelerate infinite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlinear parametric partial differential equations (PDEs). Neural operators have gained significant attention in recent years for their ability to approximate the parameter-to-solution maps defined by PDEs using as training data solutions of PDEs at a limited number of parameter samples. The computational cost of BIPs can be drastically reduced if the large number of PDE solves required for posterior characterization are replaced with evaluations of trained neural operators. However, reducing error in the resulting BIP solutions via reducing the approximation error of the neural operators in training can be challenging and unreliable. We provide an a priori error bound result that implies certain BIPs can be ill-conditioned to the approximation error of neural operators, thus leading to inaccessible accuracy requirements in training. To reliably deploy neural operators in BIPs, we consider a strategy for enhancing the performance of neural operators, which is to correct the prediction of a trained neural operator by solving a linear variational problem based on the PDE residual. We show that a trained neural operator with error correction can achieve a quadratic reduction of its approximation error, all while retaining substantial computational speedups of posterior sampling when models are governed by highly nonlinear PDEs. The strategy is applied to two numerical examples of BIPs based on a nonlinear reaction--diffusion problem and deformation of hyperelastic materials. We demonstrate that posterior representations of the two BIPs produced using trained neural operators are greatly and consistently enhanced by error correction.
△ Less
Submitted 18 October, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Interior over-penalized enriched Galerkin methods for second order elliptic equations
Authors:
Jeonghun J. Lee,
Omar Ghattas
Abstract:
In this paper we propose a variant of enriched Galerkin methods for second order elliptic equations with over-penalization of interior jump terms. The bilinear form with interior over-penalization gives a non-standard norm which is different from the discrete energy norm in the classical discontinuous Galerkin methods. Nonetheless we prove that optimal a priori error estimates with the standard di…
▽ More
In this paper we propose a variant of enriched Galerkin methods for second order elliptic equations with over-penalization of interior jump terms. The bilinear form with interior over-penalization gives a non-standard norm which is different from the discrete energy norm in the classical discontinuous Galerkin methods. Nonetheless we prove that optimal a priori error estimates with the standard discrete energy norm can be obtained by combining a priori and a posteriori error analysis techniques. We also show that the interior over-penalization is advantageous for constructing preconditioners robust to mesh refinement by analyzing spectral equivalence of bilinear forms. Numerical results are included to illustrate the convergence and preconditioning results.
△ Less
Submitted 21 October, 2023; v1 submitted 21 August, 2022;
originally announced August 2022.
-
Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization
Authors:
Omar Al Ghattas,
Daniel Sanz-Alonso
Abstract:
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a sma…
▽ More
Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
△ Less
Submitted 5 October, 2023; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Optimal design of chemoepitaxial guideposts for directed self-assembly of block copolymer systems using an inexact-Newton algorithm
Authors:
Dingcheng Luo,
Lianghao Cao,
Peng Chen,
Omar Ghattas,
J. Tinsley Oden
Abstract:
Directed self-assembly (DSA) of block-copolymers (BCPs) is one of the most promising developments in the cost-effective production of nanoscale devices. The process makes use of the natural tendency for BCP mixtures to form nanoscale structures upon phase separation. The phase separation can be directed through the use of chemically patterned substrates to promote the formation of morphologies tha…
▽ More
Directed self-assembly (DSA) of block-copolymers (BCPs) is one of the most promising developments in the cost-effective production of nanoscale devices. The process makes use of the natural tendency for BCP mixtures to form nanoscale structures upon phase separation. The phase separation can be directed through the use of chemically patterned substrates to promote the formation of morphologies that are essential to the production of semiconductor devices. Moreover, the design of substrate pattern can formulated as an optimization problem for which we seek optimal substrate designs that effectively produce given target morphologies.
In this paper, we adopt a phase field model given by a nonlocal Cahn--Hilliard partial differential equation (PDE) based on the minimization of the Ohta--Kawasaki free energy, and present an efficient PDE-constrained optimization framework for the optimal design problem. The design variables are the locations of circular- or strip-shaped guiding posts that are used to model the substrate chemical pattern. To solve the ensuing optimization problem, we propose a variant of an inexact Newton conjugate gradient algorithm tailored to this problem. We demonstrate the effectiveness of our computational strategy on numerical examples that span a range of target morphologies. Owing to our second-order optimizer and fast state solver, the numerical results demonstrate five orders of magnitude reduction in computational cost over previous work. The efficiency of our framework and the fast convergence of our optimization algorithm enable us to rapidly solve the optimal design problem in not only two, but also three spatial dimensions.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning
Authors:
Thomas O'Leary-Roseberry,
Peng Chen,
Umberto Villa,
Omar Ghattas
Abstract:
We propose derivative-informed neural operators (DINOs), a general family of neural networks to approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. After discretizations both inputs and outputs are high-dimensional. We aim to approximate not only the operators with improved accuracy but also their derivatives (Jacob…
▽ More
We propose derivative-informed neural operators (DINOs), a general family of neural networks to approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. After discretizations both inputs and outputs are high-dimensional. We aim to approximate not only the operators with improved accuracy but also their derivatives (Jacobians) with respect to the input function-valued parameter to empower derivative-based algorithms in many applications, e.g., Bayesian inverse problems, optimization under parameter uncertainty, and optimal experimental design. The major difficulties include the computational cost of generating derivative training data and the high dimensionality of the problem leading to large training cost. To address these challenges, we exploit the intrinsic low-dimensionality of the derivatives and develop algorithms for compressing derivative information and efficiently imposing it in neural operator training yielding derivative-informed neural operators. We demonstrate that these advances can significantly reduce the costs of both data generation and training for large classes of problems (e.g., nonlinear steady state parametric PDE maps), making the costs marginal or comparable to the costs without using derivatives, and in particular independent of the discretization dimension of the input and output functions. Moreover, we show that the proposed DINO achieves significantly higher accuracy than neural operators trained without derivative information, for both function approximation and derivative approximation (e.g., Gauss-Newton Hessian), especially when the training data are limited.
△ Less
Submitted 16 October, 2023; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Large-scale Bayesian optimal experimental design with derivative-informed projected neural network
Authors:
Keyi Wu,
Thomas O'Leary-Roseberry,
Peng Chen,
Omar Ghattas
Abstract:
We address the solution of large-scale Bayesian optimal experimental design (OED) problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields. The OED problem seeks to find sensor locations that maximize the expected information gain (EIG) in the solution of the underlying Bayesian inverse problem. Computation of the EIG is usually prohibitive for PDE-base…
▽ More
We address the solution of large-scale Bayesian optimal experimental design (OED) problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields. The OED problem seeks to find sensor locations that maximize the expected information gain (EIG) in the solution of the underlying Bayesian inverse problem. Computation of the EIG is usually prohibitive for PDE-based OED problems. To make the evaluation of the EIG tractable, we approximate the (PDE-based) parameter-to-observable map with a derivative-informed projected neural network (DIPNet) surrogate, which exploits the geometry, smoothness, and intrinsic low-dimensionality of the map using a small and dimension-independent number of PDE solves. The surrogate is then deployed within a greedy algorithm-based solution of the OED problem such that no further PDE solves are required. We analyze the EIG approximation error in terms of the generalization error of the DIPNet and show they are of the same order. Finally, the efficiency and accuracy of the method are demonstrated via numerical experiments on OED problems governed by inverse scattering and inverse reactive transport with up to 16,641 uncertain parameters and 100 experimental design variables, where we observe up to three orders of magnitude speedup relative to a reference double loop Monte Carlo method.
△ Less
Submitted 6 September, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Learning High-Dimensional Parametric Maps via Reduced Basis Adaptive Residual Networks
Authors:
Thomas O'Leary-Roseberry,
Xiaosong Du,
Anirban Chaudhuri,
Joaquim R. R. A. Martins,
Karen Willcox,
Omar Ghattas
Abstract:
We propose a scalable framework for the learning of high-dimensional parametric maps via adaptively constructed residual network (ResNet) maps between reduced bases of the inputs and outputs. When just few training data are available, it is beneficial to have a compact parametrization in order to ameliorate the ill-posedness of the neural network training problem. By linearly restricting high-dime…
▽ More
We propose a scalable framework for the learning of high-dimensional parametric maps via adaptively constructed residual network (ResNet) maps between reduced bases of the inputs and outputs. When just few training data are available, it is beneficial to have a compact parametrization in order to ameliorate the ill-posedness of the neural network training problem. By linearly restricting high-dimensional maps to informed reduced bases of the inputs, one can compress high-dimensional maps in a constructive way that can be used to detect appropriate basis ranks, equipped with rigorous error estimates. A scalable neural network learning framework is thus to learn the nonlinear compressed reduced basis mapping. Unlike the reduced basis construction, however, neural network constructions are not guaranteed to reduce errors by adding representation power, making it difficult to achieve good practical performance. Inspired by recent approximation theory that connects ResNets to sequential minimizing flows, we present an adaptive ResNet construction algorithm. This algorithm allows for depth-wise enrichment of the neural network approximation, in a manner that can achieve good practical performance by first training a shallow network and then adapting. We prove universal approximation of the associated neural network class for $L^2_ν$ functions on compact sets. Our overall framework allows for constructive means to detect appropriate breadth and depth, and related compact parametrizations of neural networks, significantly reducing the need for architectural hyperparameter tuning. Numerical experiments for parametric PDE problems and a 3D CFD wing design optimization parametric map demonstrate that the proposed methodology can achieve remarkably high accuracy for limited training data, and outperformed other neural network strategies we compared against.
△ Less
Submitted 15 November, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty
Authors:
Ki-Tae Kim,
Umberto Villa,
Matthew Parno,
Youssef Marzouk,
Omar Ghattas,
Noemi Petra
Abstract:
Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib…
▽ More
Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods on several inverse problems. These include problems with linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large ($\sim 50\times$) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.
△ Less
Submitted 29 December, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
An efficient method for goal-oriented linear Bayesian optimal experimental design: Application to optimal sensor placemen
Authors:
Keyi Wu,
Peng Chen,
Omar Ghattas
Abstract:
Optimal experimental design (OED) plays an important role in the problem of identifying uncertainty with limited experimental data. In many applications, we seek to minimize the uncertainty of a predicted quantity of interest (QoI) based on the solution of the inverse problem, rather than the inversion model parameter itself. In these scenarios, we develop an efficient method for goal-oriented opt…
▽ More
Optimal experimental design (OED) plays an important role in the problem of identifying uncertainty with limited experimental data. In many applications, we seek to minimize the uncertainty of a predicted quantity of interest (QoI) based on the solution of the inverse problem, rather than the inversion model parameter itself. In these scenarios, we develop an efficient method for goal-oriented optimal experimental design (GOOED) for large-scale Bayesian linear inverse problem that finds sensor locations to maximize the expected information gain (EIG) for a predicted QoI. By deriving a new formula to compute the EIG, exploiting low-rank structures of two appropriate operators, we are able to employ an online-offline decomposition scheme and a swapping greedy algorithm to maximize the EIG at a cost measured in model solutions that is independent of the problem dimensions. We provide detailed error analysis of the approximated EIG, and demonstrate the efficiency, accuracy, and both data- and parameter-dimension independence of the proposed algorithm for a contaminant transport inverse problem with infinite-dimensional parameter field.
△ Less
Submitted 5 January, 2022; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Nonuniform 3D finite difference elastic wave simulation on staggered grids
Authors:
Longfei Gao,
Omar Ghattas,
David Keyes
Abstract:
We present an approach to simulate the 3D isotropic elastic wave propagation using nonuniform finite difference discretization on staggered grids. Specifically, we consider simulation domains composed of layers of uniform grids with different grid spacings, separated by nonconforming interfaces. We demonstrate that this layer-wise finite difference discretization has the potential to significantly…
▽ More
We present an approach to simulate the 3D isotropic elastic wave propagation using nonuniform finite difference discretization on staggered grids. Specifically, we consider simulation domains composed of layers of uniform grids with different grid spacings, separated by nonconforming interfaces. We demonstrate that this layer-wise finite difference discretization has the potential to significantly reduce the simulation cost, compared to its fully uniform counterpart. Stability of such a discretization is achieved by using specially designed difference operators, which are variants of the standard difference operators with adaptations near boundaries or interfaces, and penalty terms, which are appended to the discretized wave system to weakly impose boundary or interface conditions. Combined with specially designed interpolation operators, the discretized wave system is shown to preserve the energy conserving property of the continuous elastic wave equation, and $\textit{a fortiori}$ ensure the stability of the simulation. Numerical examples are presented to demonstrate the efficacy of the proposed simulation approach.
△ Less
Submitted 9 September, 2022; v1 submitted 26 December, 2020;
originally announced December 2020.
-
Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs
Authors:
Thomas O'Leary-Roseberry,
Umberto Villa,
Peng Chen,
Omar Ghattas
Abstract:
Many-query problems, arising from uncertainty quantification, Bayesian inversion, Bayesian optimal experimental design, and optimization under uncertainty-require numerous evaluations of a parameter-to-output map. These evaluations become prohibitive if this parametric map is high-dimensional and involves expensive solution of partial differential equations (PDEs). To tackle this challenge, we pro…
▽ More
Many-query problems, arising from uncertainty quantification, Bayesian inversion, Bayesian optimal experimental design, and optimization under uncertainty-require numerous evaluations of a parameter-to-output map. These evaluations become prohibitive if this parametric map is high-dimensional and involves expensive solution of partial differential equations (PDEs). To tackle this challenge, we propose to construct surrogates for high-dimensional PDE-governed parametric maps in the form of projected neural networks that parsimoniously capture the geometry and intrinsic low-dimensionality of these maps. Specifically, we compute Jacobians of these PDE-based maps, and project the high-dimensional parameters onto a low-dimensional derivative-informed active subspace; we also project the possibly high-dimensional outputs onto their principal subspace. This exploits the fact that many high-dimensional PDE-governed parametric maps can be well-approximated in low-dimensional parameter and output subspace. We use the projection basis vectors in the active subspace as well as the principal output subspace to construct the weights for the first and last layers of the neural network, respectively. This frees us to train the weights in only the low-dimensional layers of the neural network. The architecture of the resulting neural network captures to first order, the low-dimensional structure and geometry of the parametric map. We demonstrate that the proposed projected neural network achieves greater generalization accuracy than a full neural network, especially in the limited training data regime afforded by expensive PDE-based parametric maps. Moreover, we show that the number of degrees of freedom of the inner layers of the projected network is independent of the parameter and output dimensions, and high accuracy can be achieved with weight dimension independent of the discretization dimension.
△ Less
Submitted 16 March, 2021; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Taylor approximation for chance constrained optimization problems governed by partial differential equations with high-dimensional random parameters
Authors:
Peng Chen,
Omar Ghattas
Abstract:
We propose a fast and scalable optimization method to solve chance or probabilistic constrained optimization problems governed by partial differential equations (PDEs) with high-dimensional random parameters. To address the critical computational challenges of expensive PDE solution and high-dimensional uncertainty, we construct surrogates of the constraint function by Taylor approximation, which…
▽ More
We propose a fast and scalable optimization method to solve chance or probabilistic constrained optimization problems governed by partial differential equations (PDEs) with high-dimensional random parameters. To address the critical computational challenges of expensive PDE solution and high-dimensional uncertainty, we construct surrogates of the constraint function by Taylor approximation, which relies on efficient computation of the derivatives, low rank approximation of the Hessian, and a randomized algorithm for eigenvalue decomposition. To tackle the difficulty of the non-differentiability of the inequality chance constraint, we use a smooth approximation of the discontinuous indicator function involved in the chance constraint, and apply a penalty method to transform the inequality constrained optimization problem to an unconstrained one. Moreover, we design a gradient-based optimization scheme that gradually increases smoothing and penalty parameters to achieve convergence, for which we present an efficient computation of the gradient of the approximate cost functional by the Taylor approximation. Based on numerical experiments for a problem in optimal groundwater management, we demonstrate the accuracy of the Taylor approximation, its ability to greatly accelerate constraint evaluations, the convergence of the continuation optimization scheme, and the scalability of the proposed method in terms of the number of PDE solves with increasing random parameter dimension from one thousand to hundreds of thousands.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
Bayesian inference of heterogeneous epidemic models: Application to COVID-19 spread accounting for long-term care facilities
Authors:
Peng Chen,
Keyi Wu,
Omar Ghattas
Abstract:
We propose a high dimensional Bayesian inference framework for learning heterogeneous dynamics of a COVID-19 model, with a specific application to the dynamics and severity of COVID-19 inside and outside long-term care (LTC) facilities. We develop a heterogeneous compartmental model that accounts for the heterogeneity of the time-varying spread and severity of COVID-19 inside and outside LTC facil…
▽ More
We propose a high dimensional Bayesian inference framework for learning heterogeneous dynamics of a COVID-19 model, with a specific application to the dynamics and severity of COVID-19 inside and outside long-term care (LTC) facilities. We develop a heterogeneous compartmental model that accounts for the heterogeneity of the time-varying spread and severity of COVID-19 inside and outside LTC facilities, which is characterized by time-dependent stochastic processes and time-independent parameters in $\sim$1500 dimensions after discretization. To infer these parameters, we use reported data on the number of confirmed, hospitalized, and deceased cases with suitable post-processing in both a deterministic inversion approach with appropriate regularization as a first step, followed by Bayesian inversion with proper prior distributions. To address the curse of dimensionality and the ill-posedness of the high-dimensional inference problem, we propose use of a dimension-independent projected Stein variational gradient descent method, and demonstrate the intrinsic low-dimensionality of the inverse problem. We present inference results with quantified uncertainties for both New Jersey and Texas, which experienced different epidemic phases and patterns. Moreover, we also present forecasting and validation results based on the empirical posterior samples of our inference for the future trajectory of COVID-19.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
A globally convergent modified Newton method for the direct minimization of the Ohta-Kawasaki energy with application to the directed self-assembly of diblock copolymers
Authors:
Lianghao Cao,
Omar Ghattas,
J. Tinsley Oden
Abstract:
We propose a fast and robust scheme for the direct minimization of the Ohta-Kawasaki energy that characterizes the microphase separation of diblock copolymer melts. The scheme employs a globally convergent modified Newton method with line search which is shown to be mass-conservative, energy-descending, asymptotically quadratically convergent, and three orders of magnitude more efficient than the…
▽ More
We propose a fast and robust scheme for the direct minimization of the Ohta-Kawasaki energy that characterizes the microphase separation of diblock copolymer melts. The scheme employs a globally convergent modified Newton method with line search which is shown to be mass-conservative, energy-descending, asymptotically quadratically convergent, and three orders of magnitude more efficient than the commonly-used gradient flow approach. The regularity and the first-order condition of minimizers are analyzed. A numerical study of the chemical substrate guided directed self-assembly of diblock copolymer melts, based on a novel polymer-substrate interaction model and the proposed scheme, is provided.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
A fast and scalable computational framework for large-scale and high-dimensional Bayesian optimal experimental design
Authors:
Keyi Wu,
Peng Chen,
Omar Ghattas
Abstract:
We develop a fast and scalable computational framework to solve large-scale and high-dimensional Bayesian optimal experimental design problems. In particular, we consider the problem of optimal observation sensor placement for Bayesian inference of high-dimensional parameters governed by partial differential equations (PDEs), which is formulated as an optimization problem that seeks to maximize an…
▽ More
We develop a fast and scalable computational framework to solve large-scale and high-dimensional Bayesian optimal experimental design problems. In particular, we consider the problem of optimal observation sensor placement for Bayesian inference of high-dimensional parameters governed by partial differential equations (PDEs), which is formulated as an optimization problem that seeks to maximize an expected information gain (EIG). Such optimization problems are particularly challenging due to the curse of dimensionality for high-dimensional parameters and the expensive solution of large-scale PDEs. To address these challenges, we exploit two essential properties of such problems: the low-rank structure of the Jacobian of the parameter-to-observable map to extract the intrinsically low-dimensional data-informed subspace, and the high correlation of the approximate EIGs by a series of approximations to reduce the number of PDE solves. We propose an efficient offline-online decomposition for the optimization problem: an offline stage of computing all the quantities that require a limited number of PDE solves independent of parameter and data dimensions, and an online stage of optimizing sensor placement that does not require any PDE solve. For the online optimization, we propose a swapping greedy algorithm that first construct an initial set of sensors using leverage scores and then swap the chosen sensors with other candidates until certain convergence criteria are met. We demonstrate the efficiency and scalability of the proposed computational framework by a linear inverse problem of inferring the initial condition for an advection-diffusion equation, and a nonlinear inverse problem of inferring the diffusion coefficient of a log-normal diffusion equation, with both the parameter and data dimensions ranging from a few tens to a few thousands.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Optimal design of acoustic metamaterial cloaks under uncertainty
Authors:
Peng Chen,
Michael R. Haberman,
Omar Ghattas
Abstract:
In this work, we consider the problem of optimal design of an acoustic cloak under uncertainty and develop scalable approximation and optimization methods to solve this problem. The design variable is taken as an infinite-dimensional spatially-varying field that represents the material property, while an additive infinite-dimensional random field represents the variability of the material property…
▽ More
In this work, we consider the problem of optimal design of an acoustic cloak under uncertainty and develop scalable approximation and optimization methods to solve this problem. The design variable is taken as an infinite-dimensional spatially-varying field that represents the material property, while an additive infinite-dimensional random field represents the variability of the material property or the manufacturing error. Discretization of this optimal design problem results in high-dimensional design variables and uncertain parameters. To solve this problem, we develop a computational approach based on a Taylor approximation and an approximate Newton method for optimization, which is based on a Hessian derived at the mean of the random field. We show our approach is scalable with respect to the dimension of both the design variables and uncertain parameters, in the sense that the necessary number of acoustic wave propagations is essentially independent of these dimensions, for numerical experiments with up to one million design variables and half a million uncertain parameters. We demonstrate that, using our computational approach, an optimal design of the acoustic cloak that is robust to material uncertainty is achieved in a tractable manner. The optimal design under uncertainty problem is posed and solved for the classical circular obstacle surrounded by a ring-shaped cloaking region, subjected to both a single-direction single-frequency incident wave and multiple-direction multiple-frequency incident waves. Finally, we apply the method to a deterministic large-scale optimal cloaking problem with complex geometry, to demonstrate that the approximate Newton method's Hessian computation is viable for large, complex problems.
△ Less
Submitted 26 July, 2020;
originally announced July 2020.
-
Hierarchical Matrix Approximations of Hessians Arising in Inverse Problems Governed by PDEs
Authors:
Ilona Ambartsumyan,
Wajih Boukaram,
Tan Bui-Thanh,
Omar Ghattas,
David Keyes,
Georg Stadler,
George Turkiyyah,
Stefano Zampini
Abstract:
Hessian operators arising in inverse problems governed by partial differential equations (PDEs) play a critical role in delivering efficient, dimension-independent convergence for both Newton solution of deterministic inverse problems, as well as Markov chain Monte Carlo sampling of posteriors in the Bayesian setting. These methods require the ability to repeatedly perform such operations on the H…
▽ More
Hessian operators arising in inverse problems governed by partial differential equations (PDEs) play a critical role in delivering efficient, dimension-independent convergence for both Newton solution of deterministic inverse problems, as well as Markov chain Monte Carlo sampling of posteriors in the Bayesian setting. These methods require the ability to repeatedly perform such operations on the Hessian as multiplication with arbitrary vectors, solving linear systems, inversion, and (inverse) square root. Unfortunately, the Hessian is a (formally) dense, implicitly-defined operator that is intractable to form explicitly for practical inverse problems, requiring as many PDE solves as inversion parameters. Low rank approximations are effective when the data contain limited information about the parameters, but become prohibitive as the data become more informative. However, the Hessians for many inverse problems arising in practical applications can be well approximated by matrices that have hierarchically low rank structure. Hierarchical matrix representations promise to overcome the high complexity of dense representations and provide effective data structures and matrix operations that have only log-linear complexity. In this work, we describe algorithms for constructing and updating hierarchical matrix approximations of Hessians, and illustrate them on a number of representative inverse problems involving time-dependent diffusion, advection-dominated transport, frequency domain acoustic wave propagation, and low frequency Maxwell equations, demonstrating up to an order of magnitude speedup compared to globally low rank approximations.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Stein variational reduced basis Bayesian inversion
Authors:
Peng Chen,
Omar Ghattas
Abstract:
We propose and analyze a Stein variational reduced basis method (SVRB) to solve large-scale PDE-constrained Bayesian inverse problems. To address the computational challenge of drawing numerous samples requiring expensive PDE solves from the posterior distribution, we integrate an adaptive and goal-oriented model reduction technique with an optimization-based Stein variational gradient descent met…
▽ More
We propose and analyze a Stein variational reduced basis method (SVRB) to solve large-scale PDE-constrained Bayesian inverse problems. To address the computational challenge of drawing numerous samples requiring expensive PDE solves from the posterior distribution, we integrate an adaptive and goal-oriented model reduction technique with an optimization-based Stein variational gradient descent method (SVGD). The samples are drawn from the prior distribution and iteratively pushed to the posterior by a sequence of transport maps, which are constructed by SVGD, requiring the evaluation of the potential---the negative log of the likelihood function---and its gradient with respect to the random parameters, which depend on the solution of the PDE. To reduce the computational cost, we develop an adaptive and goal-oriented model reduction technique based on reduced basis approximations for the evaluation of the potential and its gradient. We present a detailed analysis for the reduced basis approximation errors of the potential and its gradient, the induced errors of the posterior distribution measured by Kullback--Leibler divergence, as well as the errors of the samples. To demonstrate the computational accuracy and efficiency of SVRB, we report results of numerical experiments on a Bayesian inverse problem governed by a diffusion PDE with random parameters with both uniform and Gaussian prior distributions. Over 100X speedups can be achieved while the accuracy of the approximation of the potential and its gradient is preserved.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Tensor train construction from tensor actions, with application to compression of large high order derivative tensors
Authors:
Nick Alger,
Peng Chen,
Omar Ghattas
Abstract:
We present a method for converting tensors into tensor train format based on actions of the tensor as a vector-valued multilinear function. Existing methods for constructing tensor trains require access to "array entries" of the tensor and are therefore inefficient or computationally prohibitive if the tensor is accessible only through its action, especially for high order tensors. Our method perm…
▽ More
We present a method for converting tensors into tensor train format based on actions of the tensor as a vector-valued multilinear function. Existing methods for constructing tensor trains require access to "array entries" of the tensor and are therefore inefficient or computationally prohibitive if the tensor is accessible only through its action, especially for high order tensors. Our method permits efficient tensor train compression of large high order derivative tensors for nonlinear mappings that are implicitly defined through the solution of a system of equations. Array entries of these derivative tensors are not directly accessible, but actions of these tensors can be computed efficiently via a procedure that we discuss. Such tensors are often amenable to tensor train compression in theory, but until now no efficient algorithm existed to convert them into tensor train format. We demonstrate our method by compressing a Hilbert tensor of size $41 \times 42 \times 43 \times 44 \times 45$, and by forming high order (up to $5^\text{th}$ order derivatives/$6^\text{th}$ order tensors) Taylor series surrogates of the noise-whitened parameter-to-output map for a stochastic partial differential equation with boundary output.
△ Less
Submitted 4 August, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Ill-Posedness and Optimization Geometry for Nonlinear Neural Network Training
Authors:
Thomas O'Leary-Roseberry,
Omar Ghattas
Abstract:
In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networ…
▽ More
In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization
Authors:
Thomas O'Leary-Roseberry,
Nick Alger,
Omar Ghattas
Abstract:
In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA…
▽ More
In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.
△ Less
Submitted 24 August, 2021; v1 submitted 7 February, 2020;
originally announced February 2020.
-
hIPPYlib: An Extensible Software Framework for Large-Scale Inverse Problems Governed by PDEs; Part I: Deterministic Inversion and Linearized Bayesian Inference
Authors:
Umberto Villa,
Noemi Petra,
Omar Ghattas
Abstract:
We present an extensible software framework, hIPPYlib, for solution of large-scale deterministic and Bayesian inverse problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields (which are high-dimensional after discretization). hIPPYlib overcomes the prohibitive nature of Bayesian inversion for this class of problems by implementing state-of-the-art scala…
▽ More
We present an extensible software framework, hIPPYlib, for solution of large-scale deterministic and Bayesian inverse problems governed by partial differential equations (PDEs) with infinite-dimensional parameter fields (which are high-dimensional after discretization). hIPPYlib overcomes the prohibitive nature of Bayesian inversion for this class of problems by implementing state-of-the-art scalable algorithms for PDE-based inverse problems that exploit the structure of the underlying operators, notably the Hessian of the log-posterior. The key property of the algorithms implemented in hIPPYlib is that the solution of the deterministic and linearized Bayesian inverse problem is computed at a cost, measured in linearized forward PDE solves, that is independent of the parameter dimension. The mean of the posterior is approximated by the MAP point, which is found by minimizing the negative log-posterior. This deterministic nonlinear least-squares optimization problem is solved with an inexact matrix-free Newton-CG method. The posterior covariance is approximated by the inverse of the Hessian of the negative log posterior evaluated at the MAP point. This Gaussian approximation is exact when the parameter-to-observable map is linear; otherwise, its logarithm agrees to two derivatives with the log-posterior at the MAP point, and thus it can serve as a proposal for Hessian-based MCMC methods. The construction of the posterior covariance is made tractable by invoking a low-rank approximation of the Hessian of the log-likelihood. Scalable tools for sample generation are also implemented. hIPPYlib makes all of these advanced algorithms easily accessible to domain scientists and provides an environment that expedites the development of new algorithms. hIPPYlib is also a teaching tool to educate researchers and practitioners who are new to inverse problems and the Bayesian inference framework.
△ Less
Submitted 28 August, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Inexact Newton Methods for Stochastic Nonconvex Optimization with Applications to Neural Network Training
Authors:
Thomas O'Leary-Roseberry,
Nick Alger,
Omar Ghattas
Abstract:
We study stochastic inexact Newton methods and consider their application in nonconvex settings. Building on the work of [R. Bollapragada, R. H. Byrd, and J. Nocedal, IMA Journal of Numerical
Analysis, 39 (2018), pp. 545--578] we derive bounds for convergence rates in expected value for stochastic low rank Newton methods, and stochastic inexact Newton Krylov methods. These bounds quantify the er…
▽ More
We study stochastic inexact Newton methods and consider their application in nonconvex settings. Building on the work of [R. Bollapragada, R. H. Byrd, and J. Nocedal, IMA Journal of Numerical
Analysis, 39 (2018), pp. 545--578] we derive bounds for convergence rates in expected value for stochastic low rank Newton methods, and stochastic inexact Newton Krylov methods. These bounds quantify the errors incurred in subsampling the Hessian and gradient, as well as in approximating the Newton linear solve, and in choosing regularization and step length parameters. We deploy these methods in training convolutional autoencoders for the MNIST and CIFAR10 data sets. Numerical results demonstrate that, relative to first order methods, these stochastic inexact Newton methods often converge faster, are more cost-effective, and generalize better.
△ Less
Submitted 31 July, 2019; v1 submitted 16 May, 2019;
originally announced May 2019.
-
Sparse polynomial approximation for optimal control problems constrained by elliptic PDEs with lognormal random coefficients
Authors:
Peng Chen,
Omar Ghattas
Abstract:
In this work, we consider optimal control problems constrained by elliptic partial differential equations (PDEs) with lognormal random coefficients, which are represented by a countably infinite-dimensional random parameter with i.i.d. normal distribution. We approximate the optimal solution by a suitable truncation of its Hermite polynomial chaos expansion, which is known as a sparse polynomial a…
▽ More
In this work, we consider optimal control problems constrained by elliptic partial differential equations (PDEs) with lognormal random coefficients, which are represented by a countably infinite-dimensional random parameter with i.i.d. normal distribution. We approximate the optimal solution by a suitable truncation of its Hermite polynomial chaos expansion, which is known as a sparse polynomial approximation. Based on the convergence analysis in \cite{BachmayrCohenDeVoreEtAl2017} for elliptic PDEs with lognormal random coefficients, we establish the dimension-independent convergence rate of the sparse polynomial approximation of the optimal solution. Moreover, we present a polynomial-based sparse quadrature for the approximation of the expectation of the optimal solution and prove its dimension-independent convergence rate based on the analysis in \cite{Chen2018}. Numerical experiments demonstrate that the convergence of the sparse quadrature error is independent of the active parameter dimensions and can be much faster than that of a Monte Carlo method.
△ Less
Submitted 13 March, 2019;
originally announced March 2019.
-
Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions
Authors:
Peng Chen,
Keyi Wu,
Joshua Chen,
Thomas O'Leary-Roseberry,
Omar Ghattas
Abstract:
We propose a fast and scalable variational method for Bayesian inference in high-dimensional parameter space, which we call projected Stein variational Newton (pSVN) method. We exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter…
▽ More
We propose a fast and scalable variational method for Bayesian inference in high-dimensional parameter space, which we call projected Stein variational Newton (pSVN) method. We exploit the intrinsic low-dimensional geometric structure of the posterior distribution in the high-dimensional parameter space via its Hessian (of the log posterior) operator and perform a parallel update of the parameter samples projected into a low-dimensional subspace by an SVN method. The subspace is adaptively constructed using the eigenvectors of the averaged Hessian at the current samples. We demonstrate fast convergence of the proposed method and its scalability with respect to the number of parameters, samples, and processor cores.
△ Less
Submitted 9 February, 2020; v1 submitted 24 January, 2019;
originally announced January 2019.
-
Hessian-based sampling for high-dimensional model reduction
Authors:
Peng Chen,
Omar Ghattas
Abstract:
In this work we develop a Hessian-based sampling method for the construction of goal-oriented reduced order models with high-dimensional parameter inputs. Model reduction is known very challenging for high-dimensional parametric problems whose solutions also live in high-dimensional manifolds. However, the manifold of some quantity of interest (QoI) depending on the parametric solutions may be low…
▽ More
In this work we develop a Hessian-based sampling method for the construction of goal-oriented reduced order models with high-dimensional parameter inputs. Model reduction is known very challenging for high-dimensional parametric problems whose solutions also live in high-dimensional manifolds. However, the manifold of some quantity of interest (QoI) depending on the parametric solutions may be low-dimensional. We use the Hessian of the QoI with respect to the parameter to detect this low-dimensionality, and draw training samples by projecting the high-dimensional parameter to a low-dimensional subspace spanned by the eigenvectors of the Hessian corresponding to its dominating eigenvalues. Instead of forming the full Hessian, which is computationally intractable for a high-dimensional parameter, we employ a randomized algorithm to efficiently compute the dominating eigenpairs of the Hessian whose cost does not depend on the nominal dimension of the parameter but only on the intrinsic dimension of the QoI. We demonstrate that the Hessian-based sampling leads to much smaller errors of the reduced basis approximation for the QoI compared to a random sampling for a diffusion equation with random input obeying either uniform or Gaussian distributions.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
Sparse polynomial approximations for affine parametric saddle point problems
Authors:
Peng Chen,
Omar Ghattas
Abstract:
In this work we study convergence properties of sparse polynomial approximations for a class of affine parametric saddle point problems. Such problems can be found in many computational science and engineering fields, including the Stokes equations for viscous incompressible flow, mixed formulation of diffusion equations for heat conduction or groundwater flow, time-harmonic Maxwell equations for…
▽ More
In this work we study convergence properties of sparse polynomial approximations for a class of affine parametric saddle point problems. Such problems can be found in many computational science and engineering fields, including the Stokes equations for viscous incompressible flow, mixed formulation of diffusion equations for heat conduction or groundwater flow, time-harmonic Maxwell equations for electromagnetics, etc. Due to the lack of knowledge or intrinsic randomness, the coefficients of such problems are uncertain and can often be represented or approximated by high- or countably infinite-dimensional random parameters equipped with suitable probability distributions, and the coefficients affinely depend on a series of either globally or locally supported basis functions, e.g., Karhunen--Loève expansion, piecewise polynomials, or adaptive wavelet approximations. Consequently, we are faced with solving affine parametric saddle point problems. Here we study sparse polynomial approximations of the parametric solutions, in particular sparse Taylor approximations, and their convergence properties for these parametric problems. With suitable sparsity assumptions on the parametrization, we obtain the algebraic convergence rates $O(N^{-r})$ for the sparse polynomial approximations of the parametric solutions, in cases of both globally and locally supported basis functions. We prove that $r$ depends only on a sparsity parameter in the parametrization of the random input, and in particular does not depend on the number of active parameter dimensions or the number of polynomial terms $N$. These results imply that sparse polynomial approximations can effectively break the curse of dimensionality, thereby establishing a theoretical foundation for the development and application of such practical algorithms as adaptive, least-squares, and compressive sensing constructions.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
A comparative study of structural similarity and regularization for joint inverse problems governed by PDEs
Authors:
Benjamin Crestel,
Georg Stadler,
Omar Ghattas
Abstract:
Joint inversion refers to the simultaneous inference of multiple parameter fields from observations of systems governed by single or multiple forward models. In many cases these parameter fields reflect different attributes of a single medium and are thus spatially correlated or structurally similar. By imposing prior information on their spatial correlations via a joint regularization term, we se…
▽ More
Joint inversion refers to the simultaneous inference of multiple parameter fields from observations of systems governed by single or multiple forward models. In many cases these parameter fields reflect different attributes of a single medium and are thus spatially correlated or structurally similar. By imposing prior information on their spatial correlations via a joint regularization term, we seek to improve the reconstruction of the parameter fields relative to inversion for each field independently. One of the main challenges is to devise a joint regularization functional that conveys the spatial correlations or structural similarity between the fields while at the same time permitting scalable and efficient solvers for the joint inverse problem. We describe several joint regularizations that are motivated by these goals: a cross-gradient and a normalized cross-gradient structural similarity term, the vectorial total variation, and a joint regularization based on the nuclear norm of the gradients. Based on numerical results from three classes of inverse problems with piecewise-homogeneous parameter fields, we conclude that the vectorial total variation functional is preferable to the other methods considered. Besides resulting in good reconstructions in all experiments, it allows for scalable, efficient solvers for joint inverse problems governed by PDE forward models.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
Scalable matrix-free adaptive product-convolution approximation for locally translation-invariant operators
Authors:
Nick Alger,
Vishwas Rao,
Aaron Myers,
Tan Bui-Thanh,
Omar Ghattas
Abstract:
We present an adaptive grid matrix-free operator approximation scheme based on a "product-convolution" interpolation of convolution operators. This scheme is appropriate for operators that are locally translation-invariant, even if these operators are high-rank or full-rank. Such operators arise in Schur complement methods for solving partial differential equations (PDEs), as Hessians in PDE-const…
▽ More
We present an adaptive grid matrix-free operator approximation scheme based on a "product-convolution" interpolation of convolution operators. This scheme is appropriate for operators that are locally translation-invariant, even if these operators are high-rank or full-rank. Such operators arise in Schur complement methods for solving partial differential equations (PDEs), as Hessians in PDE-constrained optimization and inverse problems, as integral operators, as covariance operators, and as Dirichlet-to-Neumann maps. Constructing the approximation requires computing the impulse responses of the operator to point sources centered on nodes in an adaptively refined grid of sample points. A randomized a-posteriori error estimator drives the adaptivity. Once constructed, the approximation can be efficiently applied to vectors using the fast Fourier transform. The approximation can be efficiently converted to hierarchical matrix ($H$-matrix) format, then inverted or factorized using scalable $H$-matrix arithmetic. The quality of the approximation degrades gracefully as fewer sample points are used, allowing cheap lower quality approximations to be used as preconditioners. This yields an automated method to construct preconditioners for locally translation-invariant Schur complements. We directly address issues related to boundaries and prove that our scheme eliminates boundary artifacts. We test the scheme on a spatially varying blurring kernel, on the non-local component of an interface Schur complement for the Poisson operator, and on the data misfit Hessian for an advection dominated advection-diffusion inverse problem. Numerical results show that the scheme outperforms existing methods.
△ Less
Submitted 5 February, 2019; v1 submitted 15 May, 2018;
originally announced May 2018.
-
Taylor approximation and variance reduction for PDE-constrained optimal control under uncertainty
Authors:
Peng Chen,
Umberto Villa,
Omar Ghattas
Abstract:
In this work we develop a scalable computational framework for the solution of PDE-constrained optimal control under high-dimensional uncertainty. Specifically, we consider a mean-variance formulation of the control objective and employ a Taylor expansion with respect to the uncertain parameter either to directly approximate the control objective or as a control variate for variance reduction. The…
▽ More
In this work we develop a scalable computational framework for the solution of PDE-constrained optimal control under high-dimensional uncertainty. Specifically, we consider a mean-variance formulation of the control objective and employ a Taylor expansion with respect to the uncertain parameter either to directly approximate the control objective or as a control variate for variance reduction. The expressions for the mean and variance of the Taylor approximation are known analytically, although their evaluation requires efficient computation of the trace of the (preconditioned) Hessian of the control objective. We propose to estimate this trace by solving a generalized eigenvalue problem using a randomized algorithm that only requires the action of the Hessian on a small number of random directions. Then, the computational work does not depend on the nominal dimension of the uncertain parameter, but depends only on the effective dimension, thus ensuring scalability to high-dimensional problems. Moreover, to increase the estimation accuracy of the mean and variance of the control objective by the Taylor approximation, we use it as a control variate for variance reduction, which results in considerable computational savings (several orders of magnitude) compared to a plain Monte Carlo method. We demonstrate the accuracy, efficiency, and scalability of the proposed computational method for two examples with high-dimensional uncertain parameters: subsurface flow in a porous medium modeled as an elliptic PDE, and turbulent jet flow modeled by the Reynolds-averaged Navier--Stokes equations coupled with a nonlinear advection-diffusion equation characterizing model uncertainty. In particular, for the latter more challenging example we show scalability of our algorithm up to one million parameters resulting from discretization of the uncertain parameter field.
△ Less
Submitted 26 August, 2018; v1 submitted 11 April, 2018;
originally announced April 2018.
-
Hessian-based adaptive sparse quadrature for infinite-dimensional Bayesian inverse problems
Authors:
Peng Chen,
Umberto Villa,
Omar Ghattas
Abstract:
In this work we propose and analyze a Hessian-based adaptive sparse quadrature to compute infinite-dimensional integrals with respect to the posterior distribution in the context of Bayesian inverse problems with Gaussian prior. Due to the concentration of the posterior distribution in the domain of the prior distribution, a prior-based parametrization and sparse quadrature may fail to capture the…
▽ More
In this work we propose and analyze a Hessian-based adaptive sparse quadrature to compute infinite-dimensional integrals with respect to the posterior distribution in the context of Bayesian inverse problems with Gaussian prior. Due to the concentration of the posterior distribution in the domain of the prior distribution, a prior-based parametrization and sparse quadrature may fail to capture the posterior distribution and lead to erroneous evaluation results. By using a parametrization based on the Hessian of the negative log-posterior, the adaptive sparse quadrature can effectively allocate the quadrature points according to the posterior distribution. A dimension-independent convergence rate of the proposed method is established under certain assumptions on the Gaussian prior and the integrands. Dimension-independent and faster convergence than $O(N^{-1/2})$ is demonstrated for a linear as well as a nonlinear inverse problem whose posterior distribution can be effectively approximated by a Gaussian distribution at the MAP point.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
A-optimal encoding weights for nonlinear inverse problems, with applications to the Helmholtz inverse problem
Authors:
Benjamin Crestel,
Alen Alexanderian,
Georg Stadler,
Omar Ghattas
Abstract:
The computational cost of solving an inverse problem governed by PDEs, using multiple experiments, increases linearly with the number of experiments. A recently proposed method to decrease this cost uses only a small number of random linear combinations of all experiments for solving the inverse problem. This approach applies to inverse problems where the PDE solution depends linearly on the right…
▽ More
The computational cost of solving an inverse problem governed by PDEs, using multiple experiments, increases linearly with the number of experiments. A recently proposed method to decrease this cost uses only a small number of random linear combinations of all experiments for solving the inverse problem. This approach applies to inverse problems where the PDE solution depends linearly on the right-hand side function that models the experiment. As this method is stochastic in essence, the quality of the obtained reconstructions can vary, in particular when only a small number of combinations are used. We develop a Bayesian formulation for the definition and computation of encoding weights that lead to a parameter reconstruction with the least uncertainty. We call these weights A-optimal encoding weights. Our framework applies to inverse problems where the governing PDE is nonlinear with respect to the inversion parameter field. We formulate the problem in infinite dimensions and follow the optimize-then-discretize approach, devoting special attention to the discretization and the choice of numerical methods in order to achieve a computational cost that is independent of the parameter discretization. We elaborate our method for a Helmholtz inverse problem, and derive the adjoint-based expressions for the gradient of the objective function of the optimization problem for finding the A-optimal encoding weights. The proposed method is potentially attractive for real-time monitoring applications, where one can invest the effort to compute optimal weights offline, to later solve an inverse problem repeatedly, over time, at a fraction of the initial cost.
△ Less
Submitted 27 February, 2017; v1 submitted 7 December, 2016;
originally announced December 2016.
-
Research and Education in Computational Science and Engineering
Authors:
Ulrich Rüde,
Karen Willcox,
Lois Curfman McInnes,
Hans De Sterck,
George Biros,
Hans Bungartz,
James Corones,
Evin Cramer,
James Crowley,
Omar Ghattas,
Max Gunzburger,
Michael Hanke,
Robert Harrison,
Michael Heroux,
Jan Hesthaven,
Peter Jimack,
Chris Johnson,
Kirk E. Jordan,
David E. Keyes,
Rolf Krause,
Vipin Kumar,
Stefan Mayer,
Juan Meza,
Knut Martin Mørken,
J. Tinsley Oden
, et al. (8 additional authors not shown)
Abstract:
Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that…
▽ More
Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.
△ Less
Submitted 31 December, 2017; v1 submitted 8 October, 2016;
originally announced October 2016.
-
Weighted BFBT Preconditioner for Stokes Flow Problems with Highly Heterogeneous Viscosity
Authors:
Johann Rudi,
Georg Stadler,
Omar Ghattas
Abstract:
We present a weighted BFBT approximation (w-BFBT) to the inverse Schur complement of a Stokes system with highly heterogeneous viscosity. When used as part of a Schur complement-based Stokes preconditioner, we observe robust fast convergence for Stokes problems with smooth but highly varying (up to 10 orders of magnitude) viscosities, optimal algorithmic scalability with respect to mesh refinement…
▽ More
We present a weighted BFBT approximation (w-BFBT) to the inverse Schur complement of a Stokes system with highly heterogeneous viscosity. When used as part of a Schur complement-based Stokes preconditioner, we observe robust fast convergence for Stokes problems with smooth but highly varying (up to 10 orders of magnitude) viscosities, optimal algorithmic scalability with respect to mesh refinement, and only a mild dependence on the polynomial order of high-order finite element discretizations ($Q_k \times P_{k-1}^{disc}$, order $k \ge 2$). For certain difficult problems, we demonstrate numerically that w-BFBT significantly improves Stokes solver convergence over the widely used inverse viscosity-weighted pressure mass matrix approximation of the Schur complement. In addition, we derive theoretical eigenvalue bounds to prove spectral equivalence of w-BFBT. Using detailed numerical experiments, we discuss modifications to w-BFBT at Dirichlet boundaries that decrease the number of iterations. The overall algorithmic performance of the Stokes solver is governed by the efficacy of w-BFBT as a Schur complement approximation and, in addition, by our parallel hybrid spectral-geometric-algebraic multigrid (HMG) method, which we use to approximate the inverses of the viscous block and variable-coefficient pressure Poisson operators within w-BFBT. Building on the scalability of HMG, our Stokes solver achieves a parallel efficiency of 90% while weak scaling over a more than 600-fold increase from 48 to all 30,000 cores of TACC's Lonestar 5 supercomputer.
△ Less
Submitted 29 January, 2017; v1 submitted 13 July, 2016;
originally announced July 2016.
-
A data scalable augmented Lagrangian KKT preconditioner for large scale inverse problems
Authors:
Nick Alger,
Umberto Villa,
Tan Bui-Thanh,
Omar Ghattas
Abstract:
Current state of the art preconditioners for the reduced Hessian and the Karush-Kuhn-Tucker (KKT) operator for large scale inverse problems are typically based on approximating the reduced Hessian with the regularization operator. However, the quality of this approximation degrades with increasingly informative observations or data. Thus the best case scenario from a scientific standpoint (fully i…
▽ More
Current state of the art preconditioners for the reduced Hessian and the Karush-Kuhn-Tucker (KKT) operator for large scale inverse problems are typically based on approximating the reduced Hessian with the regularization operator. However, the quality of this approximation degrades with increasingly informative observations or data. Thus the best case scenario from a scientific standpoint (fully informative data) is the worse case scenario from a computational perspective. In this paper we present an augmented Lagrangian-type preconditioner based on a block diagonal approximation of the augmented upper left block of the KKT operator. The preconditioner requires solvers for two linear subproblems that arise in the augmented KKT operator, which we expect to be much easier to precondition than the reduced Hessian. Analysis of the spectrum of the preconditioned KKT operator indicates that the preconditioner is effective when the regularization is chosen appropriately. In particular, it is effective when the regularization does not over-penalize highly informed parameter modes and does not under-penalize uninformed modes. Finally, we present a numerical study for a large data/low noise Poisson source inversion problem, demonstrating the effectiveness of the preconditioner. In this example, three MINRES iterations on the KKT system with our preconditioner results in a reconstruction with better accuracy than 50 iterations of CG on the reduced Hessian system with regularization preconditioning.
△ Less
Submitted 2 August, 2017; v1 submitted 12 July, 2016;
originally announced July 2016.
-
Mean-variance risk-averse optimal control of systems governed by PDEs with random parameter fields using quadratic approximations
Authors:
Alen Alexanderian,
Noemi Petra,
Georg Stadler,
Omar Ghattas
Abstract:
We present a method for optimal control of systems governed by partial differential equations (PDEs) with uncertain parameter fields. We consider an objective function that involves the mean and variance of the control objective, leading to a risk-averse optimal control problem. To make the problem tractable, we invoke a quadratic Taylor series approximation of the control objective with respect t…
▽ More
We present a method for optimal control of systems governed by partial differential equations (PDEs) with uncertain parameter fields. We consider an objective function that involves the mean and variance of the control objective, leading to a risk-averse optimal control problem. To make the problem tractable, we invoke a quadratic Taylor series approximation of the control objective with respect to the uncertain parameter. This enables deriving explicit expressions for the mean and variance of the control objective in terms of its gradients and Hessians with respect to the uncertain parameter. The risk-averse optimal control problem is then formulated as a PDE-constrained optimization problem with constraints given by the forward and adjoint PDEs defining these gradients and Hessians. The expressions for the mean and variance of the control objective under the quadratic approximation involve the trace of the (preconditioned) Hessian and are thus prohibitive to evaluate. To address this, we employ trace estimators that only require a modest number of Hessian-vector products. We illustrate our approach with two problems: the control of a semilinear elliptic PDE with an uncertain boundary source term, and the control of a linear elliptic PDE with an uncertain coefficient field. For the latter problem, we derive adjoint-based expressions for efficient computation of the gradient of the risk-averse objective with respect to the controls. Our method ensures that the cost of computing the risk-averse objective and its gradient with respect to the control, measured in the number of PDE solves, is independent of the (discretized) parameter and control dimensions, and depends only on the number of random vectors employed in the trace estimation. Finally, we present a comprehensive numerical study of an optimal control problem for fluid flow in a porous medium with uncertain permeability field.
△ Less
Submitted 22 November, 2017; v1 submitted 24 February, 2016;
originally announced February 2016.
-
A Fast and Scalable Method for A-Optimal Design of Experiments for Infinite-dimensional Bayesian Nonlinear Inverse Problems
Authors:
Alen Alexanderian,
Noemi Petra,
Georg Stadler,
Omar Ghattas
Abstract:
We address the problem of optimal experimental design (OED) for Bayesian nonlinear inverse problems governed by PDEs. The goal is to find a placement of sensors, at which experimental data are collected, so as to minimize the uncertainty in the inferred parameter field. We formulate the OED objective function by generalizing the classical A-optimal experimental design criterion using the expected…
▽ More
We address the problem of optimal experimental design (OED) for Bayesian nonlinear inverse problems governed by PDEs. The goal is to find a placement of sensors, at which experimental data are collected, so as to minimize the uncertainty in the inferred parameter field. We formulate the OED objective function by generalizing the classical A-optimal experimental design criterion using the expected value of the trace of the posterior covariance. We seek a method that solves the OED problem at a cost (measured in the number of forward PDE solves) that is independent of both the parameter and sensor dimensions. To facilitate this, we construct a Gaussian approximation to the posterior at the maximum a posteriori probability (MAP) point, and use the resulting covariance operator to define the OED objective function. We use randomized trace estimation to compute the trace of this (implicitly defined) covariance operator. The resulting OED problem includes as constraints the PDEs characterizing the MAP point, and the PDEs describing the action of the covariance operator to vectors. The sparsity of the sensor configurations is controlled using sparsifying penalty functions. We elaborate our OED method for the problem of determining the sensor placement to best infer the coefficient of an elliptic PDE. Adjoint methods are used to compute the gradient of the PDE-constrained OED objective function. We provide numerical results for inference of the permeability field in a porous medium flow problem, and demonstrate that the number of PDE solves required for the evaluation of the OED objective function and its gradient is essentially independent of both the parameter and sensor dimensions. The number of quasi-Newton iterations for computing an OED also exhibits the same dimension invariance properties.
△ Less
Submitted 3 November, 2015; v1 submitted 21 October, 2014;
originally announced October 2014.