-
Quantum Circuit Encodings of Polynomial Chaos Expansions
Authors:
Junaid Aftab,
Christoph Schwab,
Haizhao Yang,
Jakob Zech
Abstract:
This work investigates the expressive power of quantum circuits in approximating high-dimensional, real-valued functions. We focus on countably-parametric holomorphic maps $u:U\to \mathbb{R}$, where the parameter domain is $U=[-1,1]^{\mathbb{N}}$. We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized polynomial chaos (gPC) expansion…
▽ More
This work investigates the expressive power of quantum circuits in approximating high-dimensional, real-valued functions. We focus on countably-parametric holomorphic maps $u:U\to \mathbb{R}$, where the parameter domain is $U=[-1,1]^{\mathbb{N}}$. We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized polynomial chaos (gPC) expansions of these parametric maps, demonstrating that these rates depend solely on the summability exponent of the gPC expansion coefficients. The key to our findings is based on the fact that so-called ``$(\boldsymbol{b},ε)$-holomorphic'' functions, where $\boldsymbol{b}\in (0,1]^\mathbb N \cap \ell^p(\mathbb N)$ for some $p\in(0,1)$, permit structured and sparse gPC expansions. Then, $n$-term truncated gPC expansions are known to admit approximation rates of order $n^{-1/p + 1/2}$ in the $L^2$ norm and of order $n^{-1/p + 1}$ in the $L^\infty$ norm. We show the existence of parameterized quantum circuit (PQC) encodings of these $n$-term truncated gPC expansions, and bound PQC depth and width via (i) tensorization of univariate PQCs that encode Chebyšev-polynomials in $[-1,1]$ and (ii) linear combination of unitaries (LCU) to build PQC emulations of $n$-term truncated gPC expansions. The results provide a rigorous mathematical foundation for the use of quantum algorithms in high-dimensional function approximation. As countably-parametric holomorphic maps naturally arise in parametric PDE models and uncertainty quantification (UQ), our results have implications for quantum-enhanced algorithms for a wide range of maps in applications.
△ Less
Submitted 3 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Sparsity for Infinite-Parametric Holomorphic Functions on Gaussian Spaces
Authors:
Carlo Marcati,
Christoph Schwab,
Jakob Zech
Abstract:
We investigate the sparsity of Wiener polynomial chaos expansions of holomorphic maps $\mathcal{G}$ on Gaussian Hilbert spaces, as arise in the coefficient-to-solution maps of linear, second order, divergence-form elliptic PDEs with log-Gaussian diffusion coefficient. Representing the Gaussian random field input as an affine-parametric expansion, the nonlinear map becomes a countably-parametric, d…
▽ More
We investigate the sparsity of Wiener polynomial chaos expansions of holomorphic maps $\mathcal{G}$ on Gaussian Hilbert spaces, as arise in the coefficient-to-solution maps of linear, second order, divergence-form elliptic PDEs with log-Gaussian diffusion coefficient. Representing the Gaussian random field input as an affine-parametric expansion, the nonlinear map becomes a countably-parametric, deterministic holomorphic map of the coordinate sequence $\boldsymbol{y} = (y_j)_{j\in\mathbb{N}} \in \mathbb{R}^\infty$. We establish weighted summability results for the Wiener-Hermite coefficient sequences of images of affine-parametric expansions of the log-Gaussian input under $\mathcal{G}$. These results give rise to $N$-term approximation rate bounds for the full range of input summability exponents $p\in (0,2)$. We show that these approximation rate bounds apply to parameter-to-solution maps for elliptic diffusion PDEs with lognormal coefficients.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Optimal Scheduling of Dynamic Transport
Authors:
Panos Tsimpos,
Zhi Ren,
Jakob Zech,
Youssef Marzouk
Abstract:
Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we sh…
▽ More
Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we show here that a specific class of ``curved'' trajectories can significantly improve approximation and learning. In particular, we consider the unit-time interpolation of any given transport map $T$ and seek the schedule $τ: [0,1] \to [0,1]$ that minimizes the spatial Lipschitz constant of the corresponding velocity field over all times $t \in [0,1]$. This quantity is crucial as it allows for control of the approximation error when the velocity field is learned from data. We show that, for a broad class of source/target measures and transport maps $T$, the \emph{optimal schedule} can be computed in closed form, and that the resulting optimal Lipschitz constant is \emph{exponentially smaller} than that induced by an identity schedule (corresponding to, for instance, the Wasserstein geodesic). Our proof technique relies on the calculus of variations and $Γ$-convergence, allowing us to approximate the aforementioned degenerate objective by a family of smooth, tractable problems.
△ Less
Submitted 17 June, 2025; v1 submitted 19 April, 2025;
originally announced April 2025.
-
Low Stein Discrepancy via Message-Passing Monte Carlo
Authors:
Nathan Kirk,
T. Konstantin Rusch,
Jakob Zech,
Daniela Rus
Abstract:
Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stei…
▽ More
Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stein-MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality. Finally, we show that Stein-MPMC outperforms competing methods, such as Stein Variational Gradient Descent and (greedy) Stein Points, by achieving a lower Stein discrepancy.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Distribution learning via neural differential equations: minimal energy regularization and approximation theory
Authors:
Youssef Marzouk,
Zhi Ren,
Jakob Zech
Abstract:
Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation…
▽ More
Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation $(1-t)x + tT(x)$, $t \in [0,1]$, of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the $C^k$ norm of the velocity field that are polynomial in the $C^k$ norm of the corresponding transport map $T$; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the $C^k$ norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy $ε> 0$ can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of $ε$, the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Statistical Learning Theory for Neural Operators
Authors:
Niklas Reinhardt,
Sven Wang,
Jakob Zech
Abstract:
We present statistical convergence results for the learning of (possibly) non-linear mappings in infinite-dimensional spaces. Specifically, given a map $G_0:\mathcal X\to\mathcal Y$ between two separable Hilbert spaces, we analyze the problem of recovering $G_0$ from $n\in\mathbb N$ noisy input-output pairs $(x_i, y_i)_{i=1}^n$ with $y_i = G_0 (x_i)+\varepsilon_i$; here the $x_i\in\mathcal X$ repr…
▽ More
We present statistical convergence results for the learning of (possibly) non-linear mappings in infinite-dimensional spaces. Specifically, given a map $G_0:\mathcal X\to\mathcal Y$ between two separable Hilbert spaces, we analyze the problem of recovering $G_0$ from $n\in\mathbb N$ noisy input-output pairs $(x_i, y_i)_{i=1}^n$ with $y_i = G_0 (x_i)+\varepsilon_i$; here the $x_i\in\mathcal X$ represent randomly drawn 'design' points, and the $\varepsilon_i$ are assumed to be either i.i.d. white noise processes or subgaussian random variables in $\mathcal{Y}$. We provide general convergence results for least-squares-type empirical risk minimizers over compact regression classes $\mathbf G\subseteq L^\infty(X,Y)$, in terms of their approximation properties and metric entropy bounds, which are derived using empirical process techniques. This generalizes classical results from finite-dimensional nonparametric regression to an infinite-dimensional setting. As a concrete application, we study an encoder-decoder based neural operator architecture termed FrameNet. Assuming $G_0$ to be holomorphic, we prove algebraic (in the sample size $n$) convergence rates in this setting, thereby overcoming the curse of dimensionality. To illustrate the wide applicability, as a prototypical example we discuss the learning of the non-linear solution operator to a parametric elliptic partial differential equation.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
On the mean field limit of consensus based methods
Authors:
Marvin Koß,
Simon Weissmann,
Jakob Zech
Abstract:
Consensus based optimization (CBO) employs a swarm of particles evolving as a system of stochastic differential equations (SDEs). Recently, it has been adapted to yield a derivative free sampling method referred to as consensus based sampling (CBS). In this paper, we investigate the ``mean field limit'' of a class of consensus methods, including CBO and CBS. This limit allows to characterize the s…
▽ More
Consensus based optimization (CBO) employs a swarm of particles evolving as a system of stochastic differential equations (SDEs). Recently, it has been adapted to yield a derivative free sampling method referred to as consensus based sampling (CBS). In this paper, we investigate the ``mean field limit'' of a class of consensus methods, including CBO and CBS. This limit allows to characterize the system's behavior as the number of particles approaches infinity. Building upon prior work such as (Huang and Qiu, 2022), we establish the existence of a unique, strong solution for these finite-particle SDEs. We further provide uniform moment estimates, which allow to show a Fokker-Planck equation in the mean-field limit. Finally, we prove that the limiting McKean-Vlasov type SDE related to the Fokker-Planck equation admits a unique solution.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Mathematical theory of deep learning
Authors:
Philipp Petersen,
Jakob Zech
Abstract:
This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on…
▽ More
This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning.
△ Less
Submitted 7 April, 2025; v1 submitted 25 July, 2024;
originally announced July 2024.
-
On the mean-field limit for Stein variational gradient descent: stability and multilevel approximation
Authors:
Simon Weissmann,
Jakob Zech
Abstract:
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood…
▽ More
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Metropolis-adjusted interacting particle sampling
Authors:
Björn Sprungk,
Simon Weissmann,
Jakob Zech
Abstract:
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and…
▽ More
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time stepping used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine Metropolization of either the whole ensemble or smaller subsets of the ensemble, and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our numerical results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Measure transport via polynomial density surrogates
Authors:
Josephine Westermann,
Jakob Zech
Abstract:
We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss th…
▽ More
We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Distribution learning via neural differential equations: a nonparametric statistical perspective
Authors:
Youssef Marzouk,
Zhi Ren,
Sven Wang,
Jakob Zech
Abstract:
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work…
▽ More
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs.
Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Deep Operator Network Approximation Rates for Lipschitz Operators
Authors:
Christoph Schwab,
Andreas Stein,
Jakob Zech
Abstract:
We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of…
▽ More
We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or Hölder) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Multilevel Domain Uncertainty Quantification in Computational Electromagnetics
Authors:
Rubén Aylwin,
Carlos Jerez-Hanckes,
Christoph Schwab,
Jakob Zech
Abstract:
We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, mapping the physical domains to a nominal polygonal domain with piecewise smooth…
▽ More
We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, mapping the physical domains to a nominal polygonal domain with piecewise smooth maps. The regularity of the pullback solutions on the nominal domain is characterized in piecewise Sobolev spaces. We prove error convergence rates and optimize the algorithmic steering of parameters for edge-element discretizations in the nominal domain combined with: (a) multilevel Monte Carlo sampling, and (b) multilevel, sparse-grid quadrature for computing the expectation of the solutions with respect to uncertain domain ensembles. In addition, we analyze sparse-grid interpolation to compute surrogates of the domain-to-solution mappings. All calculations are performed on the polyhedral nominal domain, which enables the use of standard simplicial finite element meshes. We provide a rigorous fully discrete error analysis and show, in all cases, that dimension-independent algebraic convergence is achieved. For the multilevel sparse-grid quadrature methods, we prove higher order convergence rates which are free from the so-called curse of dimensionality, i.e. independent of the number of parameters used to parametrize the admissible shapes. Numerical experiments confirm our theoretical results and verify the superiority of the sparse-grid methods.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Neural and spectral operator surrogates: unified construction and expression rate bounds
Authors:
Lukas Herrmann,
Christoph Schwab,
Jakob Zech
Abstract:
Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable…
▽ More
Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable Hilbert spaces. Operator in- and outputs from function spaces are assumed to be parametrized by stable, affine representation systems. Admissible representation systems comprise orthonormal bases, Riesz bases or suitable tight frames of the spaces under consideration. Algebraic expression rate bounds are established for both, deep neural and spectral operator surrogates acting in scales of separable Hilbert spaces containing domain and range of the map to be expressed, with finite Sobolev or Besov regularity. We illustrate the abstract concepts by expression rate bounds for the coefficient-to-solution map for a linear elliptic PDE on the torus.
△ Less
Submitted 8 February, 2024; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Multilevel Optimization for Inverse Problems
Authors:
Simon Weissmann,
Ashia Wilson,
Jakob Zech
Abstract:
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associat…
▽ More
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associated with evaluating the expensive forward maps stemming from various physical models. To demonstrate the versatility of our analysis, we discuss its implications for various methodologies including multilevel (accelerated, stochastic) gradient descent, a multilevel ensemble Kalman inversion and a multilevel Langevin sampler. We also provide numerical experiments to verify our theoretical findings.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
De Rham compatible Deep Neural Network FEM
Authors:
Marcello Longo,
Joost A. A. Opschoor,
Nico Disch,
Christoph Schwab,
Jakob Zech
Abstract:
On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element''…
▽ More
On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element'', and the ``Nédélec edge element''. For all but the CPwL case, our network architectures employ both ReLU (rectified linear unit) and BiSU (binary step unit) activations to capture discontinuities. In the important case of CPwL functions, we prove that it suffices to work with pure ReLU nets. Our construction and DNN architecture generalizes previous results in that no geometric restrictions on the regular simplicial partitions $\mathcal{T}$ of $Ω$ are required for DNN emulation. In addition, for CPwL functions our DNN construction is valid in any dimension $d\geq 2$. Our ``FE-Nets'' are required in the variationally correct, structure-preserving approximation of boundary value problems of electromagnetism in nonconvex polyhedra $Ω\subset \mathbb{R}^3$. They are thus an essential ingredient in the application of e.g., the methodology of ``physics-informed NNs'' or ``deep Ritz methods'' to electromagnetic field simulation via deep learning techniques. We indicate generalizations of our constructions to higher-order compatible spaces and other, non-compatible classes of discretizations, in particular the ``Crouzeix-Raviart'' elements and Hybridized, Higher Order (HHO) methods.
△ Less
Submitted 2 June, 2023; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs
Authors:
Dinh Dũng,
Van Kien Nguyen,
Christoph Schwab,
Jakob Zech
Abstract:
We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs.
The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It diff…
▽ More
We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs.
The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It differs from previous works that used bootstrap arguments and induction on the differentiation order of solution derivatives with respect to the parameters. The present holomorphy-based argument allows a unified, ``differentiation-free'' proof of sparsity (expressed in terms of $\ell^p$-summability or weighted $\ell^2$-summability) of sequences of Wiener-Hermite coefficients in polynomial chaos expansions in various scales of function spaces. The analysis also implies corresponding analyticity and sparsity results for posterior densities in Bayesian inverse problems subject to Gaussian priors on uncertain inputs from function spaces.
Our results furthermore yield dimension-independent convergence rates of various \emph{constructive} high-dimensional deterministic numerical approximation schemes such as single-level and multi-level versions of Hermite-Smolyak anisotropic sparse-grid interpolation and quadrature in both forward and inverse computational uncertainty quantification.
△ Less
Submitted 16 June, 2023; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,γ_d)$
Authors:
Christoph Schwab,
Jakob Zech
Abstract:
For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential…
▽ More
For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential convergence rates in $L^2(\mathbb{R}^d,γ_d)$. In case $d=\infty$, under suitable smoothness and sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with $γ_\infty$ denoting an infinite (Gaussian) product measure on $\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},γ_\infty)$. The rates only depend on quantified holomorphy of (an analytic continuation of) the map $f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Sparse approximation of triangular transports. Part II: the infinite dimensional case
Authors:
Jakob Zech,
Youssef Marzouk
Abstract:
For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior me…
▽ More
For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior measures arising in certain inference problems where the unknown belongs to an (infinite dimensional) Banach space. In particular, we show that it is possible to efficiently approximately sample from certain high-dimensional measures by transforming a lower-dimensional latent variable.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Sparse approximation of triangular transports. Part I: the finite dimensional case
Authors:
Jakob Zech,
Youssef Marzouk
Abstract:
For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expans…
▽ More
For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expansions or deep ReLU neural networks, such that the distance between $\tilde T_\sharpρ$ and $π$ decreases exponentially. More precisely, we prove error bounds of the type $\exp(-βN^{1/d})$ (or $\exp(-βN^{1/(d+1)})$ for neural networks), where $N$ refers to the dimension of the ansatz space (or the size of the network) containing $\tilde T$; the notion of distance comprises the Hellinger distance, the total variation distance, the Wasserstein distance and the Kullback-Leibler divergence. Our construction guarantees $\tilde T$ to be a monotone triangular bijective transport on the hypercube $[-1,1]^d$. Analogous results hold for the inverse transport $S=T^{-1}$. The proofs are constructive, and we give an explicit a priori description of the ansatz space, which can be used for numerical implementations.
△ Less
Submitted 28 July, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
A Posteriori Error Estimation of hp-dG Finite Element Methods for Highly Indefinite Helmholtz Problems (extended version)
Authors:
Stefan Sauter,
Jakob Zech
Abstract:
In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters…
▽ More
In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters $h$ and $p$. In contrast to the conventional conforming finite element method for indefinite problems, the dG formulation is unconditionally stable and the adaptive discretization process may start from a very coarse initial mesh. Numerical experiments will illustrate the efficiency and robustness of the method.
△ Less
Submitted 14 March, 2015; v1 submitted 5 July, 2014;
originally announced July 2014.