-
Deep Learning without Global Optimization by Random Fourier Neural Networks
Authors:
Owen Davis,
Gianluca Geraci,
Mohammad Motamed
Abstract:
We introduce a new training algorithm for deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with comp…
▽ More
We introduce a new training algorithm for deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions, determined by network complexity. Additionally, it enables efficient learning of multiscale and high-frequency features, producing interpretable parameter distributions. Despite using sinusoidal basis functions, we do not observe Gibbs phenomena in approximating discontinuous target functions.
△ Less
Submitted 4 March, 2025; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Residual Multi-Fidelity Neural Network Computing
Authors:
Owen Davis,
Mohammad Motamed,
Raul Tempone
Abstract:
In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Motivated by error-complexity estimates for ReLU neural networks, we formulate the correlation between an inexpensive low-fidelity model and an expensive high-fidelity model as a possibly non-linear residual function. This function defines a mapping between 1) the shared…
▽ More
In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Motivated by error-complexity estimates for ReLU neural networks, we formulate the correlation between an inexpensive low-fidelity model and an expensive high-fidelity model as a possibly non-linear residual function. This function defines a mapping between 1) the shared input space of the models along with the low-fidelity model output, and 2) the discrepancy between the outputs of the two models. The computational framework proceeds by training two neural networks to work in concert. The first network learns the residual function on a small set of high- and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of the second network. The trained second network then acts as our surrogate for the high-fidelity quantity of interest. We present four numerical examples to demonstrate the power of the proposed framework, showing that significant savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
△ Less
Submitted 20 December, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Approximation Power of Deep Neural Networks: an explanatory mathematical survey
Authors:
Owen Davis,
Mohammad Motamed
Abstract:
This survey provides an in-depth and explanatory review of the approximation properties of deep neural networks, with a focus on feed-forward and residual architectures. The primary objective is to examine how effectively neural networks approximate target functions and to identify conditions under which they outperform traditional approximation methods. Key topics include the nonlinear, compositi…
▽ More
This survey provides an in-depth and explanatory review of the approximation properties of deep neural networks, with a focus on feed-forward and residual architectures. The primary objective is to examine how effectively neural networks approximate target functions and to identify conditions under which they outperform traditional approximation methods. Key topics include the nonlinear, compositional structure of deep networks and the formalization of neural network tasks as optimization problems in regression and classification settings. The survey also addresses the training process, emphasizing the role of stochastic gradient descent and backpropagation in solving these optimization problems, and highlights practical considerations such as activation functions, overfitting, and regularization techniques. Additionally, the survey explores the density of neural networks in the space of continuous functions, comparing the approximation capabilities of deep ReLU networks with those of other approximation methods. It discusses recent theoretical advancements in understanding the expressiveness and limitations of these networks. A detailed error-complexity analysis is also presented, focusing on error rates and computational complexity for neural networks with ReLU and Fourier-type activation functions in the context of bounded target functions with minimal regularity assumptions. Alongside recent known results, the survey introduces new findings, offering a valuable resource for understanding the theoretical foundations of neural network approximation. Concluding remarks and further reading suggestions are provided.
△ Less
Submitted 16 December, 2024; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Hierarchical Low-Rank Approximation of Regularized Wasserstein Distance
Authors:
Mohammad Motamed
Abstract:
Sinkhorn divergence is a measure of dissimilarity between two probability measures. It is obtained through adding an entropic regularization term to Kantorovich's optimal transport problem and can hence be viewed as an entropically regularized Wasserstein distance. Given two discrete probability vectors in the $n$-simplex and supported on two bounded spaces in ${\mathbb R}^d$, we present a fast me…
▽ More
Sinkhorn divergence is a measure of dissimilarity between two probability measures. It is obtained through adding an entropic regularization term to Kantorovich's optimal transport problem and can hence be viewed as an entropically regularized Wasserstein distance. Given two discrete probability vectors in the $n$-simplex and supported on two bounded spaces in ${\mathbb R}^d$, we present a fast method for computing Sinkhorn divergence when the cost matrix can be decomposed into a $d$-term sum of asymptotically smooth Kronecker product factors. The method combines Sinkhorn's matrix scaling iteration with a low-rank hierarchical representation of the scaling matrices to achieve a near-linear complexity ${\mathcal O}(n \log^3 n)$. This provides a fast and easy-to-implement algorithm for computing Sinkhorn divergence, enabling its applicability to large-scale optimization problems, where the computation of classical Wasserstein metric is not feasible. We present a numerical example related to signal processing to demonstrate the applicability of quadratic Sinkhorn divergence in comparison with quadratic Wasserstein distance and to verify the accuracy and efficiency of the proposed method.
△ Less
Submitted 30 April, 2020; v1 submitted 26 April, 2020;
originally announced April 2020.
-
A multi-fidelity neural network surrogate sampling method for uncertainty quantification
Authors:
Mohammad Motamed
Abstract:
We propose a multi-fidelity neural network surrogate sampling method for the uncertainty quantification of physical/biological systems described by ordinary or partial differential equations. We first generate a set of low/high-fidelity data by low/high-fidelity computational models, e.g. using coarser/finer discretizations of the governing differential equations. We then construct a two-level neu…
▽ More
We propose a multi-fidelity neural network surrogate sampling method for the uncertainty quantification of physical/biological systems described by ordinary or partial differential equations. We first generate a set of low/high-fidelity data by low/high-fidelity computational models, e.g. using coarser/finer discretizations of the governing differential equations. We then construct a two-level neural network, where a large set of low-fidelity data are utilized in order to accelerate the construction of a high-fidelity surrogate model with a small set of high-fidelity data. We then embed the constructed high-fidelity surrogate model in the framework of Monte Carlo sampling. The proposed algorithm combines the approximation power of neural networks with the advantages of Monte Carlo sampling within a multi-fidelity framework. We present two numerical examples to demonstrate the accuracy and efficiency of the proposed method. We show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
△ Less
Submitted 5 May, 2020; v1 submitted 28 August, 2019;
originally announced September 2019.
-
Wasserstein metric-driven Bayesian inversion with applications to signal processing
Authors:
Mohammad Motamed,
Daniel Appelo
Abstract:
We present a Bayesian framework based on a new exponential likelihood function driven by the quadratic Wasserstien metric. Compared to conventional Bayesian models based on Gaussian likelihood functions driven by the least-squares norm ($L_2$ norm), the new framework features several advantages. First, the new framework does not rely on the likelihood of the measurement noise and hence can treat c…
▽ More
We present a Bayesian framework based on a new exponential likelihood function driven by the quadratic Wasserstien metric. Compared to conventional Bayesian models based on Gaussian likelihood functions driven by the least-squares norm ($L_2$ norm), the new framework features several advantages. First, the new framework does not rely on the likelihood of the measurement noise and hence can treat complicated noise structures such as combined additive and multiplicative noise. Secondly, unlike the normal likelihood function, the Wasserstein-based exponential likelihood function does not usually generate multiple local extrema. As a result, the new framework features better convergence to correct posteriors when a Markov Chain Monte Carlo sampling algorithm is employed. Thirdly, in the particular case of signal processing problems, while a normal likelihood function measures only the amplitude differences between the observed and simulated signals, the new likelihood function can capture both the amplitude and the phase differences. We apply the new framework to a class of signal processing problems, that is, the inverse uncertainty quantification of waveforms, and demonstrate its advantages compared to Bayesian models with normal likelihood functions.
△ Less
Submitted 26 December, 2018; v1 submitted 21 July, 2018;
originally announced July 2018.
-
Fuzzy-Stochastic Partial Differential Equations
Authors:
Mohammad Motamed
Abstract:
We introduce and study a new class of partial differential equations (PDEs) with hybrid fuzzy-stochastic parameters, coined fuzzy-stochastic PDEs. Compared to purely stochastic PDEs or purely fuzzy PDEs, fuzzy-stochastic PDEs offer powerful models for accurate representation and propagation of hybrid aleatoric-epistemic uncertainties inevitable in many real-world problems. We will use the level-se…
▽ More
We introduce and study a new class of partial differential equations (PDEs) with hybrid fuzzy-stochastic parameters, coined fuzzy-stochastic PDEs. Compared to purely stochastic PDEs or purely fuzzy PDEs, fuzzy-stochastic PDEs offer powerful models for accurate representation and propagation of hybrid aleatoric-epistemic uncertainties inevitable in many real-world problems. We will use the level-set representation of fuzzy functions and define the solution to fuzzy-stochastic PDE problems through a corresponding parametric problem, and further present theoretical results on the well-posedness and regularity of such problems. We also propose a numerical strategy for computing output fuzzy-stochastic quantities, such as fuzzy failure probabilities and fuzzy probability distributions. We present two numerical examples to compute various fuzzy-stochastic quantities and to demonstrate the applicability of fuzzy-stochastic PDEs to complex engineering problems.
△ Less
Submitted 7 June, 2019; v1 submitted 1 June, 2017;
originally announced June 2017.
-
A Fuzzy-Stochastic Multiscale Model for Fiber Composites: A one-dimensional study
Authors:
Ivo Babuska,
Mohammad Motamed
Abstract:
We study mathematical and computational models for computing the deformation of fiber-reinforced cross-plied laminates due to external forces. This requires an understanding of both micro-structural effects and different sources of uncertainty in the problem. We first show that the uncertainties in the problem are of both statistical (aleatoric) and systematic (epistemic) types and that current mu…
▽ More
We study mathematical and computational models for computing the deformation of fiber-reinforced cross-plied laminates due to external forces. This requires an understanding of both micro-structural effects and different sources of uncertainty in the problem. We first show that the uncertainties in the problem are of both statistical (aleatoric) and systematic (epistemic) types and that current multiscale stochastic models, such as stationary random fields, which are based on precise probability theory, are not capable of correctly characterizing uncertainty in fiber composites. Next, we motivate the applicability of models based on imprecise uncertainty theory and present a novel fuzzy-stochastic model, which can more accurately describe uncertainties in fiber composites. The new model is constructed by combining stochastic fields and fuzzy variables through a simple calibration-validation approach. Finally, we construct a global-local multiscale algorithm for efficiently computing output quantities of interest. The method aims at approximating required quantities, such as displacements and stresses, in regions of relatively small size, e.g. hot spots or zones. The algorithm uses the concept of representative volume elements and computes a global solution to construct a local approximation that captures the microscale features of the solution. The results are based on and backed by real experimental data.
△ Less
Submitted 13 October, 2015;
originally announced October 2015.
-
A Sparse Stochastic Collocation Technique for High-Frequency Wave Propagation with Uncertainty
Authors:
Gabriela Malenova,
Mohammad Motamed,
Olof Runborg,
Raul Tempone
Abstract:
We consider the wave equation with highly oscillatory initial data, where there is uncertainty in the wave speed, initial phase and/or initial amplitude. To estimate quantities of interest related to the solution and their statistics, we combine a high-frequency method based on Gaussian beams with sparse stochastic collocation. Although the wave solution, $u^\varepsilon$, is highly oscillatory in…
▽ More
We consider the wave equation with highly oscillatory initial data, where there is uncertainty in the wave speed, initial phase and/or initial amplitude. To estimate quantities of interest related to the solution and their statistics, we combine a high-frequency method based on Gaussian beams with sparse stochastic collocation. Although the wave solution, $u^\varepsilon$, is highly oscillatory in both physical and stochastic spaces, we provide theoretical arguments and numerical evidence that quantities of interest based on local averages of $|u^\varepsilon|^2$ are smooth, with derivatives in the stochastic space uniformly bounded in $\varepsilon$, where $\varepsilon$ denotes the short wavelength. This observable related regularity makes the sparse stochastic collocation approach more efficient than Monte Carlo methods. We present numerical tests that demonstrate this advantage.
△ Less
Submitted 10 September, 2015; v1 submitted 20 July, 2015;
originally announced July 2015.
-
Fast Bayesian Optimal Experimental Design for Seismic Source Inversion
Authors:
Quan Long,
Mohammad Motamed,
Raul Tempone
Abstract:
We develop a fast method for optimally designing experiments in the context of statistical seismic source inversion. In particular, we efficiently compute the optimal number and locations of the receivers or seismographs. The seismic source is modeled by a point moment tensor multiplied by a time-dependent function. The parameters include the source location, moment tensor components, and start ti…
▽ More
We develop a fast method for optimally designing experiments in the context of statistical seismic source inversion. In particular, we efficiently compute the optimal number and locations of the receivers or seismographs. The seismic source is modeled by a point moment tensor multiplied by a time-dependent function. The parameters include the source location, moment tensor components, and start time and frequency in the time function. The forward problem is modeled by elastodynamic wave equations. We show that the Hessian of the cost functional, which is usually defined as the square of the weighted L2 norm of the difference between the experimental data and the simulated data, is proportional to the measurement time and the number of receivers. Consequently, the posterior distribution of the parameters, in a Bayesian setting, concentrates around the "true" parameters, and we can employ Laplace approximation and speed up the estimation of the expected Kullback-Leibler divergence (expected information gain), the optimality criterion in the experimental design procedure. Since the source parameters span several magnitudes, we use a scaling matrix for efficient control of the condition number of the original Hessian matrix. We use a second-order accurate finite difference method to compute the Hessian matrix and either sparse quadrature or Monte Carlo sampling to carry out numerical integration. We demonstrate the efficiency, accuracy, and applicability of our method on a two-dimensional seismic source inversion problem.
△ Less
Submitted 27 February, 2015;
originally announced February 2015.
-
Taylor Expansion and Discretization Errors in Gaussian Beam Superposition
Authors:
Mohammad Motamed,
Olof Runborg
Abstract:
The Gaussian beam superposition method is an asymptotic method for computing high frequency wave fields in smoothly varying inhomogeneous media. In this paper we study the accuracy of the Gaussian beam superposition method and derive error estimates related to the discretization of the superposition integral and the Taylor expansion of the phase and amplitude off the center of the beam. We show…
▽ More
The Gaussian beam superposition method is an asymptotic method for computing high frequency wave fields in smoothly varying inhomogeneous media. In this paper we study the accuracy of the Gaussian beam superposition method and derive error estimates related to the discretization of the superposition integral and the Taylor expansion of the phase and amplitude off the center of the beam. We show that in the case of odd order beams, the error is smaller than a simple analysis would indicate because of error cancellation effects between the beams. Since the cancellation happens only when odd order beams are used, there is no remarkable gain in using even order beams. Moreover, applying the error estimate to the problem with constant speed of propagation, we show that in this case the local beam width is not a good indicator of accuracy, and there is no direct relation between the error and the beam width. We present numerical examples to verify the error estimates.
△ Less
Submitted 4 February, 2010; v1 submitted 24 August, 2009;
originally announced August 2009.