Search | arXiv e-print repository

Quantum Circuit Encodings of Polynomial Chaos Expansions

Authors: Junaid Aftab, Christoph Schwab, Haizhao Yang, Jakob Zech

Abstract: This work investigates the expressive power of quantum circuits in approximating high-dimensional, real-valued functions. We focus on countably-parametric holomorphic maps $u:U\to \mathbb{R}$, where the parameter domain is $U=[-1,1]^{\mathbb{N}}$. We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized polynomial chaos (gPC) expansion… ▽ More This work investigates the expressive power of quantum circuits in approximating high-dimensional, real-valued functions. We focus on countably-parametric holomorphic maps $u:U\to \mathbb{R}$, where the parameter domain is $U=[-1,1]^{\mathbb{N}}$. We establish dimension-independent quantum circuit approximation rates via the best $n$-term truncations of generalized polynomial chaos (gPC) expansions of these parametric maps, demonstrating that these rates depend solely on the summability exponent of the gPC expansion coefficients. The key to our findings is based on the fact that so-called ``$(\boldsymbol{b},ε)$-holomorphic'' functions, where $\boldsymbol{b}\in (0,1]^\mathbb N \cap \ell^p(\mathbb N)$ for some $p\in(0,1)$, permit structured and sparse gPC expansions. Then, $n$-term truncated gPC expansions are known to admit approximation rates of order $n^{-1/p + 1/2}$ in the $L^2$ norm and of order $n^{-1/p + 1}$ in the $L^\infty$ norm. We show the existence of parameterized quantum circuit (PQC) encodings of these $n$-term truncated gPC expansions, and bound PQC depth and width via (i) tensorization of univariate PQCs that encode Chebyšev-polynomials in $[-1,1]$ and (ii) linear combination of unitaries (LCU) to build PQC emulations of $n$-term truncated gPC expansions. The results provide a rigorous mathematical foundation for the use of quantum algorithms in high-dimensional function approximation. As countably-parametric holomorphic maps naturally arise in parametric PDE models and uncertainty quantification (UQ), our results have implications for quantum-enhanced algorithms for a wide range of maps in applications. △ Less

Submitted 3 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

arXiv:2504.21639 [pdf, ps, other]

Sparsity for Infinite-Parametric Holomorphic Functions on Gaussian Spaces

Authors: Carlo Marcati, Christoph Schwab, Jakob Zech

Abstract: We investigate the sparsity of Wiener polynomial chaos expansions of holomorphic maps $\mathcal{G}$ on Gaussian Hilbert spaces, as arise in the coefficient-to-solution maps of linear, second order, divergence-form elliptic PDEs with log-Gaussian diffusion coefficient. Representing the Gaussian random field input as an affine-parametric expansion, the nonlinear map becomes a countably-parametric, d… ▽ More We investigate the sparsity of Wiener polynomial chaos expansions of holomorphic maps $\mathcal{G}$ on Gaussian Hilbert spaces, as arise in the coefficient-to-solution maps of linear, second order, divergence-form elliptic PDEs with log-Gaussian diffusion coefficient. Representing the Gaussian random field input as an affine-parametric expansion, the nonlinear map becomes a countably-parametric, deterministic holomorphic map of the coordinate sequence $\boldsymbol{y} = (y_j)_{j\in\mathbb{N}} \in \mathbb{R}^\infty$. We establish weighted summability results for the Wiener-Hermite coefficient sequences of images of affine-parametric expansions of the log-Gaussian input under $\mathcal{G}$. These results give rise to $N$-term approximation rate bounds for the full range of input summability exponents $p\in (0,2)$. We show that these approximation rate bounds apply to parameter-to-solution maps for elliptic diffusion PDEs with lognormal coefficients. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.14425 [pdf, ps, other]

Optimal Scheduling of Dynamic Transport

Authors: Panos Tsimpos, Zhi Ren, Jakob Zech, Youssef Marzouk

Abstract: Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we sh… ▽ More Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we show here that a specific class of ``curved'' trajectories can significantly improve approximation and learning. In particular, we consider the unit-time interpolation of any given transport map $T$ and seek the schedule $τ: [0,1] \to [0,1]$ that minimizes the spatial Lipschitz constant of the corresponding velocity field over all times $t \in [0,1]$. This quantity is crucial as it allows for control of the approximation error when the velocity field is learned from data. We show that, for a broad class of source/target measures and transport maps $T$, the \emph{optimal schedule} can be computed in closed form, and that the resulting optimal Lipschitz constant is \emph{exponentially smaller} than that induced by an identity schedule (corresponding to, for instance, the Wasserstein geodesic). Our proof technique relies on the calculus of variations and $Γ$-convergence, allowing us to approximate the aforementioned degenerate objective by a family of smooth, tractable problems. △ Less

Submitted 17 June, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

arXiv:2503.21103 [pdf, other]

Low Stein Discrepancy via Message-Passing Monte Carlo

Authors: Nathan Kirk, T. Konstantin Rusch, Jakob Zech, Daniela Rus

Abstract: Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stei… ▽ More Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stein-MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality. Finally, we show that Stein-MPMC outperforms competing methods, such as Stein Variational Gradient Descent and (greedy) Stein Points, by achieving a lower Stein discrepancy. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: 8 pages, 2 figures, Accepted at the ICLR 2025 Workshop on Frontiers in Probabilistic Inference

arXiv:2502.03795 [pdf, ps, other]

Distribution learning via neural differential equations: minimal energy regularization and approximation theory

Authors: Youssef Marzouk, Zhi Ren, Jakob Zech

Abstract: Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation… ▽ More Neural ordinary differential equations (ODEs) provide expressive representations of invertible transport maps that can be used to approximate complex probability distributions, e.g., for generative modeling, density estimation, and Bayesian inference. We show that for a large class of transport maps $T$, there exists a time-dependent ODE velocity field realizing a straight-line interpolation $(1-t)x + tT(x)$, $t \in [0,1]$, of the displacement induced by the map. Moreover, we show that such velocity fields are minimizers of a training objective containing a specific minimum-energy regularization. We then derive explicit upper bounds for the $C^k$ norm of the velocity field that are polynomial in the $C^k$ norm of the corresponding transport map $T$; in the case of triangular (Knothe--Rosenblatt) maps, we also show that these bounds are polynomial in the $C^k$ norms of the associated source and target densities. Combining these results with stability arguments for distribution approximation via ODEs, we show that Wasserstein or Kullback--Leibler approximation of the target distribution to any desired accuracy $ε> 0$ can be achieved by a deep neural network representation of the velocity field whose size is bounded explicitly in terms of $ε$, the dimension, and the smoothness of the source and target densities. The same neural network ansatz yields guarantees on the value of the regularized training objective. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2412.17582 [pdf, ps, other]

Statistical Learning Theory for Neural Operators

Authors: Niklas Reinhardt, Sven Wang, Jakob Zech

Abstract: We present statistical convergence results for the learning of (possibly) non-linear mappings in infinite-dimensional spaces. Specifically, given a map $G_0:\mathcal X\to\mathcal Y$ between two separable Hilbert spaces, we analyze the problem of recovering $G_0$ from $n\in\mathbb N$ noisy input-output pairs $(x_i, y_i)_{i=1}^n$ with $y_i = G_0 (x_i)+\varepsilon_i$; here the $x_i\in\mathcal X$ repr… ▽ More We present statistical convergence results for the learning of (possibly) non-linear mappings in infinite-dimensional spaces. Specifically, given a map $G_0:\mathcal X\to\mathcal Y$ between two separable Hilbert spaces, we analyze the problem of recovering $G_0$ from $n\in\mathbb N$ noisy input-output pairs $(x_i, y_i)_{i=1}^n$ with $y_i = G_0 (x_i)+\varepsilon_i$; here the $x_i\in\mathcal X$ represent randomly drawn 'design' points, and the $\varepsilon_i$ are assumed to be either i.i.d. white noise processes or subgaussian random variables in $\mathcal{Y}$. We provide general convergence results for least-squares-type empirical risk minimizers over compact regression classes $\mathbf G\subseteq L^\infty(X,Y)$, in terms of their approximation properties and metric entropy bounds, which are derived using empirical process techniques. This generalizes classical results from finite-dimensional nonparametric regression to an infinite-dimensional setting. As a concrete application, we study an encoder-decoder based neural operator architecture termed FrameNet. Assuming $G_0$ to be holomorphic, we prove algebraic (in the sample size $n$) convergence rates in this setting, thereby overcoming the curse of dimensionality. To illustrate the wide applicability, as a prototypical example we discuss the learning of the non-linear solution operator to a parametric elliptic partial differential equation. △ Less

Submitted 23 December, 2024; originally announced December 2024.

MSC Class: 62G05

arXiv:2409.03518 [pdf, ps, other]

On the mean field limit of consensus based methods

Authors: Marvin Koß, Simon Weissmann, Jakob Zech

Abstract: Consensus based optimization (CBO) employs a swarm of particles evolving as a system of stochastic differential equations (SDEs). Recently, it has been adapted to yield a derivative free sampling method referred to as consensus based sampling (CBS). In this paper, we investigate the ``mean field limit'' of a class of consensus methods, including CBO and CBS. This limit allows to characterize the s… ▽ More Consensus based optimization (CBO) employs a swarm of particles evolving as a system of stochastic differential equations (SDEs). Recently, it has been adapted to yield a derivative free sampling method referred to as consensus based sampling (CBS). In this paper, we investigate the ``mean field limit'' of a class of consensus methods, including CBO and CBS. This limit allows to characterize the system's behavior as the number of particles approaches infinity. Building upon prior work such as (Huang and Qiu, 2022), we establish the existence of a unique, strong solution for these finite-particle SDEs. We further provide uniform moment estimates, which allow to show a Fokker-Planck equation in the mean-field limit. Finally, we prove that the limiting McKean-Vlasov type SDE related to the Fokker-Planck equation admits a unique solution. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 34 pages

arXiv:2407.18384 [pdf, other]

Mathematical theory of deep learning

Authors: Philipp Petersen, Jakob Zech

Abstract: This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on… ▽ More This book provides an introduction to the mathematical analysis of deep learning. It covers fundamental results in approximation theory, optimization theory, and statistical learning theory, which are the three main pillars of deep neural network theory. Serving as a guide for students and researchers in mathematics and related fields, the book aims to equip readers with foundational knowledge on the topic. It prioritizes simplicity over generality, and presents rigorous yet accessible results to help build an understanding of the essential mathematical concepts underpinning deep learning. △ Less

Submitted 7 April, 2025; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2402.01320 [pdf, ps, other]

On the mean-field limit for Stein variational gradient descent: stability and multilevel approximation

Authors: Simon Weissmann, Jakob Zech

Abstract: In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood… ▽ More In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.13889 [pdf, ps, other]

Metropolis-adjusted interacting particle sampling

Authors: Björn Sprungk, Simon Weissmann, Jakob Zech

Abstract: In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and… ▽ More In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time stepping used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine Metropolization of either the whole ensemble or smaller subsets of the ensemble, and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our numerical results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.04172 [pdf, other]

Measure transport via polynomial density surrogates

Authors: Josephine Westermann, Jakob Zech

Abstract: We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss th… ▽ More We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 51 pages

MSC Class: 65C10; 62F15; 65C05; 65D40; 41A10; 41A25; 41A63

arXiv:2309.01043 [pdf, ps, other]

Distribution learning via neural differential equations: a nonparametric statistical perspective

Authors: Youssef Marzouk, Zhi Ren, Sven Wang, Jakob Zech

Abstract: Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work… ▽ More Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs. Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2307.09835 [pdf, ps, other]

Deep Operator Network Approximation Rates for Lipschitz Operators

Authors: Christoph Schwab, Andreas Stein, Jakob Zech

Abstract: We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of… ▽ More We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or Hölder) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 31 pages

MSC Class: 41A65; 68T15; 68Q32

arXiv:2212.07240 [pdf, ps, other]

Multilevel Domain Uncertainty Quantification in Computational Electromagnetics

Authors: Rubén Aylwin, Carlos Jerez-Hanckes, Christoph Schwab, Jakob Zech

Abstract: We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, mapping the physical domains to a nominal polygonal domain with piecewise smooth… ▽ More We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, mapping the physical domains to a nominal polygonal domain with piecewise smooth maps. The regularity of the pullback solutions on the nominal domain is characterized in piecewise Sobolev spaces. We prove error convergence rates and optimize the algorithmic steering of parameters for edge-element discretizations in the nominal domain combined with: (a) multilevel Monte Carlo sampling, and (b) multilevel, sparse-grid quadrature for computing the expectation of the solutions with respect to uncertain domain ensembles. In addition, we analyze sparse-grid interpolation to compute surrogates of the domain-to-solution mappings. All calculations are performed on the polyhedral nominal domain, which enables the use of standard simplicial finite element meshes. We provide a rigorous fully discrete error analysis and show, in all cases, that dimension-independent algebraic convergence is achieved. For the multilevel sparse-grid quadrature methods, we prove higher order convergence rates which are free from the so-called curse of dimensionality, i.e. independent of the number of parameters used to parametrize the admissible shapes. Numerical experiments confirm our theoretical results and verify the superiority of the sparse-grid methods. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2207.04950 [pdf, ps, other]

Neural and spectral operator surrogates: unified construction and expression rate bounds

Authors: Lukas Herrmann, Christoph Schwab, Jakob Zech

Abstract: Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable… ▽ More Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable Hilbert spaces. Operator in- and outputs from function spaces are assumed to be parametrized by stable, affine representation systems. Admissible representation systems comprise orthonormal bases, Riesz bases or suitable tight frames of the spaces under consideration. Algebraic expression rate bounds are established for both, deep neural and spectral operator surrogates acting in scales of separable Hilbert spaces containing domain and range of the map to be expressed, with finite Sobolev or Besov regularity. We illustrate the abstract concepts by expression rate bounds for the coefficient-to-solution map for a linear elliptic PDE on the torus. △ Less

Submitted 8 February, 2024; v1 submitted 11 July, 2022; originally announced July 2022.

arXiv:2204.13732 [pdf, other]

Multilevel Optimization for Inverse Problems

Authors: Simon Weissmann, Ashia Wilson, Jakob Zech

Abstract: Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associat… ▽ More Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associated with evaluating the expensive forward maps stemming from various physical models. To demonstrate the versatility of our analysis, we discuss its implications for various methodologies including multilevel (accelerated, stochastic) gradient descent, a multilevel ensemble Kalman inversion and a multilevel Langevin sampler. We also provide numerical experiments to verify our theoretical findings. △ Less

Submitted 28 April, 2022; originally announced April 2022.

MSC Class: 65N21; 65N75; 65K10

arXiv:2201.05395 [pdf, other]

doi 10.1016/j.neunet.2023.06.008

De Rham compatible Deep Neural Network FEM

Authors: Marcello Longo, Joost A. A. Opschoor, Nico Disch, Christoph Schwab, Jakob Zech

Abstract: On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element''… ▽ More On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element'', and the ``Nédélec edge element''. For all but the CPwL case, our network architectures employ both ReLU (rectified linear unit) and BiSU (binary step unit) activations to capture discontinuities. In the important case of CPwL functions, we prove that it suffices to work with pure ReLU nets. Our construction and DNN architecture generalizes previous results in that no geometric restrictions on the regular simplicial partitions $\mathcal{T}$ of $Ω$ are required for DNN emulation. In addition, for CPwL functions our DNN construction is valid in any dimension $d\geq 2$. Our ``FE-Nets'' are required in the variationally correct, structure-preserving approximation of boundary value problems of electromagnetism in nonconvex polyhedra $Ω\subset \mathbb{R}^3$. They are thus an essential ingredient in the application of e.g., the methodology of ``physics-informed NNs'' or ``deep Ritz methods'' to electromagnetic field simulation via deep learning techniques. We indicate generalizations of our constructions to higher-order compatible spaces and other, non-compatible classes of discretizations, in particular the ``Crouzeix-Raviart'' elements and Hybridized, Higher Order (HHO) methods. △ Less

Submitted 2 June, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

MSC Class: 41A05; 68Q32; 26B40; 65N30

arXiv:2201.01912 [pdf, ps, other]

Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs

Authors: Dinh Dũng, Van Kien Nguyen, Christoph Schwab, Jakob Zech

Abstract: We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs. The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It diff… ▽ More We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs. The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It differs from previous works that used bootstrap arguments and induction on the differentiation order of solution derivatives with respect to the parameters. The present holomorphy-based argument allows a unified, ``differentiation-free'' proof of sparsity (expressed in terms of $\ell^p$-summability or weighted $\ell^2$-summability) of sequences of Wiener-Hermite coefficients in polynomial chaos expansions in various scales of function spaces. The analysis also implies corresponding analyticity and sparsity results for posterior densities in Bayesian inverse problems subject to Gaussian priors on uncertain inputs from function spaces. Our results furthermore yield dimension-independent convergence rates of various \emph{constructive} high-dimensional deterministic numerical approximation schemes such as single-level and multi-level versions of Hermite-Smolyak anisotropic sparse-grid interpolation and quadrature in both forward and inverse computational uncertainty quantification. △ Less

Submitted 16 June, 2023; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 165 pages

arXiv:2111.07080 [pdf, ps, other]

Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,γ_d)$

Authors: Christoph Schwab, Jakob Zech

Abstract: For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential… ▽ More For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential convergence rates in $L^2(\mathbb{R}^d,γ_d)$. In case $d=\infty$, under suitable smoothness and sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with $γ_\infty$ denoting an infinite (Gaussian) product measure on $\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},γ_\infty)$. The rates only depend on quantified holomorphy of (an analytic continuation of) the map $f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2107.13422 [pdf, ps, other]

Sparse approximation of triangular transports. Part II: the infinite dimensional case

Authors: Jakob Zech, Youssef Marzouk

Abstract: For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior me… ▽ More For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior measures arising in certain inference problems where the unknown belongs to an (infinite dimensional) Banach space. In particular, we show that it is possible to efficiently approximately sample from certain high-dimensional measures by transforming a lower-dimensional latent variable. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: The original manuscript arXiv:2006.06994v1 has been split into two parts; the present paper is the second part

arXiv:2006.06994 [pdf, ps, other]

Sparse approximation of triangular transports. Part I: the finite dimensional case

Authors: Jakob Zech, Youssef Marzouk

Abstract: For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expans… ▽ More For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expansions or deep ReLU neural networks, such that the distance between $\tilde T_\sharpρ$ and $π$ decreases exponentially. More precisely, we prove error bounds of the type $\exp(-βN^{1/d})$ (or $\exp(-βN^{1/(d+1)})$ for neural networks), where $N$ refers to the dimension of the ansatz space (or the size of the network) containing $\tilde T$; the notion of distance comprises the Hellinger distance, the total variation distance, the Wasserstein distance and the Kullback-Leibler divergence. Our construction guarantees $\tilde T$ to be a monotone triangular bijective transport on the hypercube $[-1,1]^d$. Analogous results hold for the inverse transport $S=T^{-1}$. The proofs are constructive, and we give an explicit a priori description of the ansatz space, which can be used for numerical implementations. △ Less

Submitted 28 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: The original manuscript arXiv:2006.06994v1 has been split into two parts; the present paper is the first part

MSC Class: 32D05; 41A10; 41A25; 41A46; 62D99; 65D15

arXiv:1407.1430 [pdf, other]

A Posteriori Error Estimation of hp-dG Finite Element Methods for Highly Indefinite Helmholtz Problems (extended version)

Authors: Stefan Sauter, Jakob Zech

Abstract: In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters… ▽ More In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters $h$ and $p$. In contrast to the conventional conforming finite element method for indefinite problems, the dG formulation is unconditionally stable and the adaptive discretization process may start from a very coarse initial mesh. Numerical experiments will illustrate the efficiency and robustness of the method. △ Less

Submitted 14 March, 2015; v1 submitted 5 July, 2014; originally announced July 2014.

Comments: 39 pages, 10 figures

Showing 1–22 of 22 results for author: Zech, J