Search | arXiv e-print repository

An overview of diffusion models for generative artificial intelligence

Authors: Davide Gallon, Arnulf Jentzen, Philippe von Wurstemberger

Abstract: This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also… ▽ More This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also review selected extensions and improvements of the basic framework from the literature such as improved DDPMs, denoising diffusion implicit models, classifier-free diffusion guidance models, and latent diffusion models. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: 56 pages, 5 figures

arXiv:2408.13222 [pdf, other]

An Overview on Machine Learning Methods for Partial Differential Equations: from Physics Informed Neural Networks to Deep Operator Learning

Authors: Lukas Gonon, Arnulf Jentzen, Benno Kuckuck, Siyu Liang, Adrian Riekert, Philippe von Wurstemberger

Abstract: The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of art… ▽ More The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of artificial neural networks (ANNs) by means of stochastic gradient descent type optimization methods. While approximation methods for PDEs using ANNs have first been proposed in the 1990s they have only gained wide popularity in the last decade with the rise of deep learning. This article aims to provide an introduction to some of these methods and the mathematical theory on which they are based. We discuss methods such as physics-informed neural networks (PINNs) and deep BSDE methods and consider several operator learning approaches. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2310.20360 [pdf, other]

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

Authors: Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

Abstract: This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorit… ▽ More This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-Łojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning. △ Less

Submitted 25 February, 2025; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 712 pages, 36 figures, 45 source codes, 87 exercises. In v2, the material on optimization algorithms/methods has been significantly expanded

MSC Class: 68T07

arXiv:2302.03286 [pdf, other]

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

Authors: Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

Abstract: In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the pro… ▽ More In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing traditional approximation algorithms as well as existing deep operator learning methodologies from the literature. △ Less

Submitted 29 May, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 39 pages, 16 Figures

arXiv:2202.02717 [pdf, other]

doi 10.1111/mafi.12405

Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing

Authors: Sebastian Becker, Arnulf Jentzen, Marvin S. Müller, Philippe von Wurstemberger

Abstract: In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain… ▽ More In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain a single approximation. In this paper we introduce a new approximation strategy for parametric approximation problems including the parametric financial pricing problems described above. A central aspect of the approximation strategy proposed in this article is to combine MC algorithms with machine learning techniques to, roughly speaking, learn the random variables (LRV) in MC simulations. In other words, we employ stochastic gradient descent (SGD) optimization methods not to train parameters of standard artificial neural networks (ANNs) but to learn random variables appearing in MC approximations. We numerically test the LRV strategy on various parametric problems with convincing results when compared with standard MC simulations, Quasi-Monte Carlo simulations, SGD-trained shallow ANNs, and SGD-trained deep ANNs. Our numerical simulations strongly indicate that the LRV strategy might be capable to overcome the curse of dimensionality in the $L^\infty$-norm in several cases where the standard deep learning approach has been proven not to be able to do so. This is not a contradiction to lower bounds established in the scientific literature because this new LRV strategy is outside of the class of algorithms for which lower bounds have been established in the scientific literature. The proposed LRV strategy is of general nature and not only restricted to the parametric financial pricing problems described above, but applicable to a large class of approximation problems. △ Less

Submitted 8 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

Comments: 71 pages, 4 Figures, 14 Tables; to appear in Math. Finance

MSC Class: 35K15; 65C05; 65M75; 68T99; 91G20

arXiv:2012.04326 [pdf, other]

High-dimensional approximation spaces of artificial neural networks and applications to partial differential equations

Authors: Pierfrancesco Beneventano, Patrick Cheridito, Arnulf Jentzen, Philippe von Wurstemberger

Abstract: In this paper we develop a new machinery to study the capacity of artificial neural networks (ANNs) to approximate high-dimensional functions without suffering from the curse of dimensionality. Specifically, we introduce a concept which we refer to as approximation spaces of artificial neural networks and we present several tools to handle those spaces. Roughly speaking, approximation spaces consi… ▽ More In this paper we develop a new machinery to study the capacity of artificial neural networks (ANNs) to approximate high-dimensional functions without suffering from the curse of dimensionality. Specifically, we introduce a concept which we refer to as approximation spaces of artificial neural networks and we present several tools to handle those spaces. Roughly speaking, approximation spaces consist of sequences of functions which can, in a suitable way, be approximated by ANNs without curse of dimensionality in the sense that the number of required ANN parameters to approximate a function of the sequence with an accuracy $\varepsilon > 0$ grows at most polynomially both in the reciprocal $1/\varepsilon$ of the required accuracy and in the dimension $d \in \mathbb{N} = \{1, 2, 3, \ldots \}$ of the function. We show that these approximation spaces are closed under various operations including linear combinations, formations of limits, and infinite compositions. To illustrate the utility of the machinery proposed in this paper, we employ the developed theory to prove that ANNs have the capacity to overcome the curse of dimensionality in the numerical approximation of certain first order transport partial differential equations (PDEs). We even prove that approximation spaces are closed under flows of first order transport PDEs. △ Less

Submitted 28 January, 2025; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: 31 pages

arXiv:2005.10206 [pdf, other]

doi 10.4208/cicp.OA-2020-0130

Numerical simulations for full history recursive multilevel Picard approximations for systems of high-dimensional partial differential equations

Authors: Sebastian Becker, Ramon Braunwarth, Martin Hutzenthaler, Arnulf Jentzen, Philippe von Wurstemberger

Abstract: One of the most challenging issues in applied mathematics is to develop and analyze algorithms which are able to approximately compute solutions of high-dimensional nonlinear partial differential equations (PDEs). In particular, it is very hard to develop approximation algorithms which do not suffer under the curse of dimensionality in the sense that the number of computational operations needed b… ▽ More One of the most challenging issues in applied mathematics is to develop and analyze algorithms which are able to approximately compute solutions of high-dimensional nonlinear partial differential equations (PDEs). In particular, it is very hard to develop approximation algorithms which do not suffer under the curse of dimensionality in the sense that the number of computational operations needed by the algorithm to compute an approximation of accuracy $ε> 0$ grows at most polynomially in both the reciprocal $1/ε$ of the required accuracy and the dimension $d \in \mathbb{N}$ of the PDE. Recently, a new approximation method, the so-called full history recursive multilevel Picard (MLP) approximation method, has been introduced and, until today, this approximation scheme is the only approximation method in the scientific literature which has been proven to overcome the curse of dimensionality in the numerical approximation of semilinear PDEs with general time horizons. It is a key contribution of this article to extend the MLP approximation method to systems of semilinear PDEs and to numerically test it on several example PDEs. More specifically, we apply the proposed MLP approximation method in the case of Allen-Cahn PDEs, Sine-Gordon-type PDEs, systems of coupled semilinear heat PDEs, and semilinear Black-Scholes PDEs in up to 1000 dimensions. The presented numerical simulation results suggest in the case of each of these example PDEs that the proposed MLP approximation method produces very accurate results in short runtimes and, in particular, the presented numerical simulation results indicate that the proposed MLP approximation scheme significantly outperforms certain deep learning based approximation methods for high-dimensional semilinear PDEs. △ Less

Submitted 25 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: 21 pages

MSC Class: 65M75 ACM Class: G.1.8

Journal ref: Commun. Comput. Phys. 28 (2020), no. 5, 2109-2138

arXiv:1903.05985 [pdf, ps, other]

doi 10.1214/20-EJP423

Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks

Authors: Martin Hutzenthaler, Arnulf Jentzen, Philippe von Wurstemberger

Abstract: Parabolic partial differential equations (PDEs) are widely used in the mathematical modeling of natural phenomena and man made complex systems. In particular, parabolic PDEs are a fundamental tool to determine fair prices of financial derivatives in the financial industry. The PDEs appearing in financial engineering applications are often nonlinear and high dimensional since the dimension typicall… ▽ More Parabolic partial differential equations (PDEs) are widely used in the mathematical modeling of natural phenomena and man made complex systems. In particular, parabolic PDEs are a fundamental tool to determine fair prices of financial derivatives in the financial industry. The PDEs appearing in financial engineering applications are often nonlinear and high dimensional since the dimension typically corresponds to the number of considered financial assets. A major issue is that most approximation methods for nonlinear PDEs in the literature suffer under the so-called curse of dimensionality in the sense that the computational effort to compute an approximation with a prescribed accuracy grows exponentially in the dimension of the PDE or in the reciprocal of the prescribed approximation accuracy and nearly all approximation methods have not been shown not to suffer under the curse of dimensionality. Recently, a new class of approximation schemes for semilinear parabolic PDEs, termed full history recursive multilevel Picard (MLP) algorithms, were introduced and it was proven that MLP algorithms do overcome the curse of dimensionality for semilinear heat equations. In this paper we extend those findings to a more general class of semilinear PDEs including as special cases semilinear Black-Scholes equations used for the pricing of financial derivatives with default risks. More specifically, we introduce an MLP algorithm for the approximation of solutions of semilinear Black-Scholes equations and prove that the computational effort of our method grows at most polynomially both in the dimension and the reciprocal of the prescribed approximation accuracy. This is, to the best of our knowledge, the first result showing that the approximation of solutions of semilinear Black-Scholes equations is a polynomially tractable approximation problem. △ Less

Submitted 14 March, 2019; originally announced March 2019.

Comments: 71 pages. arXiv admin note: text overlap with arXiv:1807.01212

Journal ref: Electron. J. Probab. 25 (2020), 101

arXiv:1809.02362 [pdf, ps, other]

doi 10.1090/memo/1410

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations

Authors: Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger

Abstract: Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the c… ▽ More Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of them prove convergence without convergence rates and some even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ and the PDE dimension $d \in \mathbb{N}$. We thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs. △ Less

Submitted 25 January, 2023; v1 submitted 7 September, 2018; originally announced September 2018.

Comments: To appear in Mem. Amer. Math. Soc.; 126 pages

Journal ref: Mem. Amer. Math. Soc.284(2023), no.1410, v+93 pp

arXiv:1807.01212 [pdf, ps, other]

doi 10.1098/rspa.2019.0630

Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations

Authors: Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen, Philippe von Wurstemberger

Abstract: For a long time it is well-known that high-dimensional linear parabolic partial differential equations (PDEs) can be approximated by Monte Carlo methods with a computational effort which grows polynomially both in the dimension and in the reciprocal of the prescribed accuracy. In other words, linear PDEs do not suffer from the curse of dimensionality. For general semilinear PDEs with Lipschitz coe… ▽ More For a long time it is well-known that high-dimensional linear parabolic partial differential equations (PDEs) can be approximated by Monte Carlo methods with a computational effort which grows polynomially both in the dimension and in the reciprocal of the prescribed accuracy. In other words, linear PDEs do not suffer from the curse of dimensionality. For general semilinear PDEs with Lipschitz coefficients, however, it remained an open question whether these suffer from the curse of dimensionality. In this paper we partially solve this open problem. More precisely, we prove in the case of semilinear heat equations with gradient-independent and globally Lipschitz continuous nonlinearities that the computational effort of a variant of the recently introduced multilevel Picard approximations grows polynomially both in the dimension and in the reciprocal of the required accuracy. △ Less

Submitted 24 June, 2020; v1 submitted 3 July, 2018; originally announced July 2018.

MSC Class: 65M75

Journal ref: Proceedings of the Royal Society A 476, no. 2244 (2020): 20190630

arXiv:1803.08600 [pdf, ps, other]

doi 10.1016/j.jco.2019.101438

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

Authors: Arnulf Jentzen, Philippe von Wurstemberger

Abstract: The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this… ▽ More The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this article we establish for every $γ, ν\in (0,\infty)$ essentially matching lower and upper bounds for the mean square error of the SGD process with learning rates $(\fracγ{n^ν})_{n \in \mathbb{N}}$ associated to a simple quadratic stochastic optimization problem. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the asymptotic behavior of the learning rates. △ Less

Submitted 22 March, 2018; originally announced March 2018.

Comments: 42 pages

Journal ref: J. Complexity 57 (2020), 101438

arXiv:1801.09324 [pdf, ps, other]

doi 10.1093/imanum/drz055

Strong error analysis for stochastic gradient descent optimization algorithms

Authors: Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von Wurstemberger

Abstract: Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty)$ and every arbitrarily large $p\in (0,\infty)$ that the considered SGD optimization algorithm converges… ▽ More Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty)$ and every arbitrarily large $p\in (0,\infty)$ that the considered SGD optimization algorithm converges in the strong $L^p$-sense with order $\frac{1}{2}-\varepsilon$ to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures, and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large $ p \in (0,\infty) $ strong $ L^p $-convergence rates. This article also contains an extensive review of results on SGD optimization algorithms in the scientific literature. △ Less

Submitted 28 January, 2018; originally announced January 2018.

Journal ref: IMA J. Numer. Anal. (2020), drz055

Showing 1–12 of 12 results for author: von Wurstemberger, P