-
Gaussian Variational Schemes on Bounded and Unbounded Domains
Authors:
Jonas A. Actor,
Anthony Gruber,
Eric C. Cyr,
Nathaniel Trask
Abstract:
A machine-learnable variational scheme using Gaussian radial basis functions (GRBFs) is presented and used to approximate linear problems on bounded and unbounded domains. In contrast to standard mesh-free methods, which use GRBFs to discretize strong-form differential equations, this work exploits the relationship between integrals of GRBFs, their derivatives, and polynomial moments to produce ex…
▽ More
A machine-learnable variational scheme using Gaussian radial basis functions (GRBFs) is presented and used to approximate linear problems on bounded and unbounded domains. In contrast to standard mesh-free methods, which use GRBFs to discretize strong-form differential equations, this work exploits the relationship between integrals of GRBFs, their derivatives, and polynomial moments to produce exact quadrature formulae which enable weak-form expressions. Combined with trainable GRBF means and covariances, this leads to a flexible, generalized Galerkin variational framework which is applied in the infinite-domain setting where the scheme is conforming, as well as the bounded-domain setting where it is not. Error rates for the proposed GRBF scheme are derived in each case, and examples are presented demonstrating utility of this approach as a surrogate modeling technique.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Multigrid-in-time preconditioners for KKT systems
Authors:
Radoslav Vuchkov,
Eric C. Cyr,
Denis Ridzal
Abstract:
We develop multigrid-in-time preconditioners for Karush-Kuhn-Tucker (KKT) systems that arise in the solution of time-dependent optimization problems. We focus on a specific instance of KKT systems, known as augmented systems, which underpin the composite-step sequential quadratic programming framework [1]. To enable time-domain decomposition, our approach introduces virtual state variables and con…
▽ More
We develop multigrid-in-time preconditioners for Karush-Kuhn-Tucker (KKT) systems that arise in the solution of time-dependent optimization problems. We focus on a specific instance of KKT systems, known as augmented systems, which underpin the composite-step sequential quadratic programming framework [1]. To enable time-domain decomposition, our approach introduces virtual state variables and continuity constraints at each discrete time interval. The virtual state variables not only facilitate a decoupling in time but also give rise to fixed-point iterations that aid the solution of KKT systems. These fixed-point schemes can be used either as preconditioners for Krylov subspace methods or as smoothers for multigrid-in-time schemes. For the latter, we develop a block-Jacobi scheme that parallelizes trivially in the time domain. To complete the multigrid construction, we use simple prolongation and restriction operators based on geometric multigrid ideas, and a coarse-grid solver based on a GMRES iteration preconditioned with the symmetric block Gauss-Seidel scheme. We present two optimal control examples, involving the viscous Burgers' and van der Pol oscillator equations, respectively, and demonstrate algorithmic scalability.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Graph Neural Networks and Applied Linear Algebra
Authors:
Nicholas S. Moore,
Eric C. Cyr,
Peter Ohm,
Christopher M. Siefert,
Raymond S. Tuminaro
Abstract:
Sparse matrix computations are ubiquitous in scientific computing. With the recent interest in scientific machine learning, it is natural to ask how sparse matrix computations can leverage neural networks (NN). Unfortunately, multi-layer perceptron (MLP) neural networks are typically not natural for either graph or sparse matrix computations. The issue lies with the fact that MLPs require fixed-si…
▽ More
Sparse matrix computations are ubiquitous in scientific computing. With the recent interest in scientific machine learning, it is natural to ask how sparse matrix computations can leverage neural networks (NN). Unfortunately, multi-layer perceptron (MLP) neural networks are typically not natural for either graph or sparse matrix computations. The issue lies with the fact that MLPs require fixed-sized inputs while scientific applications generally generate sparse matrices with arbitrary dimensions and a wide range of nonzero patterns (or matrix graph vertex interconnections). While convolutional NNs could possibly address matrix graphs where all vertices have the same number of nearest neighbors, a more general approach is needed for arbitrary sparse matrices, e.g. arising from discretized partial differential equations on unstructured meshes. Graph neural networks (GNNs) are one approach suitable to sparse matrices. GNNs define aggregation functions (e.g., summations) that operate on variable size input data to produce data of a fixed output size so that MLPs can be applied. The goal of this paper is to provide an introduction to GNNs for a numerical linear algebra audience. Concrete examples are provided to illustrate how many common linear algebra tasks can be accomplished using GNNs. We focus on iterative methods that employ computational kernels such as matrix-vector products, interpolation, relaxation methods, and strength-of-connection measures. Our GNN examples include cases where parameters are determined a-priori as well as cases where parameters must be learned. The intent with this article is to help computational scientists understand how GNNs can be used to adapt machine learning concepts to computational tasks associated with sparse matrices. It is hoped that this understanding will stimulate data-driven extensions of classical sparse linear algebra tasks.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
A 2-Level Domain Decomposition Preconditioner for KKT Systems with Heat-Equation Constraints
Authors:
Eric C. Cyr
Abstract:
Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new two level domain decomposition preconditioner to solve these linear systems when constrained by the heat eq…
▽ More
Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new two level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time, and thus amenable to classical domain decomposition methods. Further, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
Reduced Basis Approximations of Parameterized Dynamical Partial Differential Equations via Neural Networks
Authors:
Peter Sentz,
Kristian Beckwith,
Eric C. Cyr,
Luke N. Olson,
Ravi Patel
Abstract:
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining effici…
▽ More
Projection-based reduced order models are effective at approximating parameter-dependent differential equations that are parametrically separable. When parametric separability is not satisfied, which occurs in both linear and nonlinear problems, projection-based methods fail to adequately reduce the computational complexity. Devising alternative reduced order models is crucial for obtaining efficient and accurate approximations to expensive high-fidelity models. In this work, we develop a time-stepping procedure for dynamical parameter-dependent problems, in which a neural-network is trained to propagate the coefficients of a reduced basis expansion. This results in an online stage with a computational cost independent of the size of the underlying problem. We demonstrate our method on several parabolic partial differential equations, including a problem that is not parametrically separable.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
A Monolithic Algebraic Multigrid Framework for Multiphysics Applications with Examples from Resistive MHD
Authors:
Peter Ohm,
Tobias Wiesner,
Eric C. Cyr,
Jonathan J. Hu,
John N. Shadid,
Raymond S. Tuminaro
Abstract:
A multigrid framework is described for multiphysics applications. The framework allows one to construct, adapt, and tailor a monolithic multigrid methodology to different linear systems coming from discretized partial differential equations. The main idea centers on developing multigrid components in a blocked fashion where each block corresponds to separate sets of physical unknowns and equations…
▽ More
A multigrid framework is described for multiphysics applications. The framework allows one to construct, adapt, and tailor a monolithic multigrid methodology to different linear systems coming from discretized partial differential equations. The main idea centers on developing multigrid components in a blocked fashion where each block corresponds to separate sets of physical unknowns and equations within the larger discretization matrix. Once defined, these components are ultimately assembled into a monolithic multigrid solver for the entire system. We demonstrate the potential of the framework by applying it to representative linear solution sub-problems arising from resistive MHD.
△ Less
Submitted 22 March, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Partition of unity networks: deep hp-approximation
Authors:
Kookjin Lee,
Nathaniel A. Trask,
Ravi G. Patel,
Mamikon A. Gulian,
Eric C. Cyr
Abstract:
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measu…
▽ More
Approximation theorists have established best-in-class optimal approximation rates of deep neural networks by utilizing their ability to simultaneously emulate partitions of unity and monomials. Motivated by this, we propose partition of unity networks (POUnets) which incorporate these elements directly into the architecture. Classification architectures of the type used to learn probability measures are used to build a meshfree partition of space, while polynomial spaces with learnable coefficients are associated to each partition. The resulting hp-element-like approximation allows use of a fast least-squares optimizer, and the resulting architecture size need not scale exponentially with spatial dimension, breaking the curse of dimensionality. An abstract approximation result establishes desirable properties to guide network design. Numerical results for two choices of architecture demonstrate that POUnets yield hp-convergence for smooth functions and consistently outperform MLPs for piecewise polynomial functions with large numbers of discontinuities.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
Thermodynamically consistent physics-informed neural networks for hyperbolic systems
Authors:
Ravi G. Patel,
Indu Manickam,
Nathaniel A. Trask,
Mitchell A. Wood,
Myoungkyu Lee,
Ignacio Tomas,
Eric C. Cyr
Abstract:
Physics-informed neural network architectures have emerged as a powerful tool for developing flexible PDE solvers which easily assimilate data, but face challenges related to the PDE discretization underpinning them. By instead adapting a least squares space-time control volume scheme, we circumvent issues particularly related to imposition of boundary conditions and conservation while reducing so…
▽ More
Physics-informed neural network architectures have emerged as a powerful tool for developing flexible PDE solvers which easily assimilate data, but face challenges related to the PDE discretization underpinning them. By instead adapting a least squares space-time control volume scheme, we circumvent issues particularly related to imposition of boundary conditions and conservation while reducing solution regularity requirements. Additionally, connections to classical finite volume methods allows application of biases toward entropy solutions and total variation diminishing properties. For inverse problems, we may impose further thermodynamic biases, allowing us to fit shock hydrodynamics models to molecular simulation of rarefied gases and metals. The resulting data-driven equations of state may be incorporated into traditional shock hydrodynamics codes.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
A physics-informed operator regression framework for extracting data-driven continuum models
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mitchell A. Wood,
Eric C. Cyr
Abstract:
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a…
▽ More
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Monolithic Multigrid for Magnetohydrodynamics
Authors:
J. H. Adler,
T. Benson,
E. C. Cyr,
P. E. Farrell,
S. MacLachlan,
R. Tuminaro
Abstract:
The magnetohydrodynamics (MHD) equations model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. After discretization and linearization, the resulting system of equations is generally difficult to solve due to the coupling between variables, an…
▽ More
The magnetohydrodynamics (MHD) equations model a wide range of plasma physics applications and are characterized by a nonlinear system of partial differential equations that strongly couples a charged fluid with the evolution of electromagnetic fields. After discretization and linearization, the resulting system of equations is generally difficult to solve due to the coupling between variables, and the heterogeneous coefficients induced by the linearization process. In this paper, we investigate multigrid preconditioners for this system based on specialized relaxation schemes that properly address the system structure and coupling. Three extensions of Vanka relaxation are proposed and applied to problems with up to 170 million degrees of freedom and fluid and magnetic Reynolds numbers up to 400 for stationary problems and up to 20,000 for time-dependent problems.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
Multilevel Initialization for Layer-Parallel Deep Neural Network Training
Authors:
Eric C. Cyr,
Stefanie Günther,
Jacob B. Schroder
Abstract:
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able…
▽ More
This paper investigates multilevel initialization strategies for training very deep neural networks with a layer-parallel multigrid solver. The scheme is based on the continuous interpretation of the training problem as a problem of optimal control, in which neural networks are represented as discretizations of time-dependent ordinary differential equations. A key goal is to develop a method able to intelligently initialize the network parameters for the very deep networks enabled by scalable layer-parallel training. To do this, we apply a refinement strategy across the time domain, that is equivalent to refining in the layer dimension. The resulting refinements create deep networks, with good initializations for the network parameters coming from the coarser trained networks. We investigate the effectiveness of such multilevel "nested iteration" strategies for network training, showing supporting numerical evidence of reduced run time for equivalent accuracy. In addition, we study whether the initialization strategies provide a regularizing effect on the overall training process and reduce sensitivity to hyperparameters and randomness in initial network parameters.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Authors:
Eric C. Cyr,
Mamikon A. Gulian,
Ravi G. Patel,
Mauro Perego,
Nathaniel A. Trask
Abstract:
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram…
▽ More
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Layer-Parallel Training of Deep Residual Neural Networks
Authors:
S. Günther,
L. Ruthotto,
J. B. Schroder,
E. C. Cyr,
N. R. Gauger
Abstract:
Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural netw…
▽ More
Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural network. Hence, training a ResNet can be cast as an optimal control problem of the associated dynamical system. For similar time-dependent optimal control problems arising in engineering applications, parallel-in-time methods have shown notable improvements in scalability. This paper demonstrates the use of those techniques for efficient and effective training of ResNets. The proposed algorithms replace the classical (sequential) forward and backward propagation through the network layers by a parallel nonlinear multigrid iteration applied to the layer domain. This adds a new dimension of parallelism across layers that is attractive when training very deep networks. From this basic idea, we derive multiple layer-parallel methods. The most efficient version employs a simultaneous optimization approach where updates to the network parameters are based on inexact gradient information in order to speed up the training process. Using numerical examples from supervised classification, we demonstrate that the new approach achieves similar training performance to traditional methods, but enables layer-parallelism and thus provides speedup over layer-serial methods through greater concurrency.
△ Less
Submitted 25 July, 2019; v1 submitted 11 December, 2018;
originally announced December 2018.
-
Goal-Oriented Adaptivity and Multilevel Preconditioning for the Poisson-Boltzmann Equation
Authors:
Burak Aksoylu,
Stephen Bond,
Eric Cyr,
Michael Holst
Abstract:
In this article, we develop goal-oriented error indicators to drive adaptive refinement algorithms for the Poisson-Boltzmann equation. Empirical results for the solvation free energy linear functional demonstrate that goal-oriented indicators are not sufficient on their own to lead to a superior refinement algorithm. To remedy this, we propose a problem-specific marking strategy using the solvatio…
▽ More
In this article, we develop goal-oriented error indicators to drive adaptive refinement algorithms for the Poisson-Boltzmann equation. Empirical results for the solvation free energy linear functional demonstrate that goal-oriented indicators are not sufficient on their own to lead to a superior refinement algorithm. To remedy this, we propose a problem-specific marking strategy using the solvation free energy computed from the solution of the linear regularized Poisson-Boltzmann equation. The convergence of the solvation free energy using this marking strategy, combined with goal-oriented refinement, compares favorably to adaptive methods using an energy-based error indicator. Due to the use of adaptive mesh refinement, it is critical to use multilevel preconditioning in order to maintain optimal computational complexity. We use variants of the classical multigrid method, which can be viewed as generalizations of the hierarchical basis multigrid and Bramble-Pasciak-Xu (BPX) preconditioners.
△ Less
Submitted 19 September, 2011;
originally announced September 2011.