Skip to main content

Showing 1–50 of 139 results for author: Jentzen, A

.
  1. arXiv:2505.22085  [pdf, other

    math.OC cs.LG math.NA

    PADAM: Parallel averaged Adam reduces the error for stochastic optimization in scientific machine learning

    Authors: Arnulf Jentzen, Julian Kranz, Adrian Riekert

    Abstract: Averaging techniques such as Ruppert--Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of stochastic gradient descent (SGD) optimization methods such as the popular ADAM optimizer. However, depending on the specific optimization problem under consideration, the type and the parameters for the averaging need to be adjusted to ac… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: 38 pages, 13 figures

  2. arXiv:2505.17032  [pdf, ps, other

    math.NA cs.CE cs.LG

    A brief review of the Deep BSDE method for solving high-dimensional partial differential equations

    Authors: Jiequn Han, Arnulf Jentzen, Weinan E

    Abstract: High-dimensional partial differential equations (PDEs) pose significant challenges for numerical computation due to the curse of dimensionality, which limits the applicability of traditional mesh-based methods. Since 2017, the Deep BSDE method has introduced deep learning techniques that enable the effective solution of nonlinear PDEs in very high dimensions. This innovation has sparked considerab… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Journal ref: ICBS proceedings of Frontiers of Science Awards (2024)

  3. arXiv:2505.09572  [pdf, other

    cs.LG math.LO math.OC stat.ML

    SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

    Authors: Julian Kranz, Davide Gallon, Steffen Dereich, Arnulf Jentzen

    Abstract: We study gradient flows for loss landscapes of fully connected feed forward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove t… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 27 pages, 4 figures

    MSC Class: Primary 68T05; Secondary 68T07; 26B40; 03C64; 03C98

  4. arXiv:2504.19426  [pdf, ps, other

    math.OC cs.AI

    Sharp higher order convergence rates for the Adam optimizer

    Authors: Steffen Dereich, Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent based optimization methods are the methods of choice to train deep neural networks in machine learning. Beyond the standard gradient descent method, also suitable modified variants of standard gradient descent involving acceleration techniques such as the momentum method and/or adaptivity techniques such as the RMSprop method are frequently considered optimization methods. These d… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 27 pages

    MSC Class: 68T05; 65K05; 90C25 ACM Class: I.2.0

  5. arXiv:2503.01660  [pdf, ps, other

    cs.LG math.NA

    Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks

    Authors: Thang Do, Arnulf Jentzen, Adrian Riekert

    Abstract: Despite the omnipresent use of stochastic gradient descent (SGD) optimization methods in the training of deep neural networks (DNNs), it remains, in basically all practically relevant scenarios, a fundamental open problem to provide a rigorous theoretical explanation for the success (and the limitations) of SGD optimization methods in deep learning. In particular, it remains an open question to pr… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 42 pages

    MSC Class: 68T07; 65K10; 60G60; 65D15 ACM Class: G.1.6; F.2.0; G.3

  6. arXiv:2502.14180  [pdf

    cs.LG cs.CL

    On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Benno Kuckuck

    Abstract: We present a method of generating first-order logic statements whose complexity can be controlled along multiple dimensions. We use this method to automatically create several datasets consisting of questions asking for the truth or falsity of first-order logic statements in Zermelo-Fraenkel set theory. While the resolution of these questions does not require any knowledge beyond basic notation of… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 67 pages, 24 figures

    ACM Class: I.2.6

  7. arXiv:2501.15646  [pdf, ps, other

    cs.LG math.NA

    Mathematical analysis of the gradients in deep learning

    Authors: Steffen Dereich, Thang Do, Arnulf Jentzen, Frederic Weber

    Abstract: Deep learning algorithms -- typically consisting of a class of deep artificial neural networks (ANNs) trained by a stochastic gradient descent (SGD) optimization method -- are nowadays an integral part in many areas of science, industry, and also our day to day life. Roughly speaking, in their most basic form, ANNs can be regarded as functions that consist of a series of compositions of affine-lin… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: 38 pages

    MSC Class: 68T07 ACM Class: I.2.6

  8. arXiv:2501.06081  [pdf, other

    math.OC cs.LG math.NA

    Averaged Adam accelerates stochastic optimization in the training of deep neural network approximations for partial differential equation and optimal control problems

    Authors: Steffen Dereich, Arnulf Jentzen, Adrian Riekert

    Abstract: Deep learning methods - usually consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays omnipresent in data-driven learning problems as well as in scientific computing tasks such as optimal control (OC) and partial differential equation (PDE) problems. In practically relevant learning tasks, often not the plain-vanilla… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 25 pages, 10 figures

  9. arXiv:2412.01371  [pdf, other

    cs.LG cs.AI

    An overview of diffusion models for generative artificial intelligence

    Authors: Davide Gallon, Arnulf Jentzen, Philippe von Wurstemberger

    Abstract: This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 56 pages, 5 figures

  10. arXiv:2410.10533  [pdf, ps, other

    cs.LG math.NA math.OC math.PR stat.ML

    Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation

    Authors: Thang Do, Sonja Hannibal, Arnulf Jentzen

    Abstract: Deep learning methods - consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays key tools to solve data driven supervised learning problems. Despite the great success of SGD methods in the training of DNNs, it remains a fundamental open problem of research to explain the success and the limitations of such methods in ri… ▽ More

    Submitted 14 February, 2025; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: 91 pages. arXiv admin note: text overlap with arXiv:2310.20360

    MSC Class: 68T07; 65K10; 60G60; 65D15 ACM Class: G.1.6; F.2.0; G.3

  11. arXiv:2408.13222  [pdf, other

    math.NA stat.ML

    An Overview on Machine Learning Methods for Partial Differential Equations: from Physics Informed Neural Networks to Deep Operator Learning

    Authors: Lukas Gonon, Arnulf Jentzen, Benno Kuckuck, Siyu Liang, Adrian Riekert, Philippe von Wurstemberger

    Abstract: The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of art… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2407.21078  [pdf, ps, other

    math.OC cs.LG math.PR stat.ML

    Convergence rates for the Adam optimizer

    Authors: Steffen Dereich, Arnulf Jentzen

    Abstract: Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods are applied. As of today, m… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  13. arXiv:2407.08100  [pdf, ps, other

    cs.LG math.OC math.PR

    Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates

    Authors: Steffen Dereich, Robin Graeber, Arnulf Jentzen

    Abstract: Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies. For example, SGD methods are used to train powerful large language models (LLMs) such as versi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 54 pages

    MSC Class: 60J22 (Primary); 65K10; 60J20; 65C40 (Secondary) ACM Class: G.1.6; F.2.0; G.3

  14. arXiv:2406.14340  [pdf, other

    math.OC cs.LG math.NA

    Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

    Authors: Steffen Dereich, Arnulf Jentzen, Adrian Riekert

    Abstract: It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 68 pages, 8 figures

  15. arXiv:2406.10876  [pdf, ps, other

    cs.LG math.NA math.PR

    Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for space-time solutions of semilinear partial differential equations

    Authors: Julia Ackermann, Arnulf Jentzen, Benno Kuckuck, Joshua Lee Padgett

    Abstract: It is a challenging topic in applied mathematics to solve high-dimensional nonlinear partial differential equations (PDEs). Standard approximation methods for nonlinear PDEs suffer under the curse of dimensionality (COD) in the sense that the number of computational operations of the approximation method grows at least exponentially in the PDE dimension and with such methods it is essentially impo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 64 pages. arXiv admin note: text overlap with arXiv:2309.13722, arXiv:2310.20360

    MSC Class: 65M15; 65C05; 68T07 (Primary) 60H35 (Secondary)

  16. arXiv:2402.05155  [pdf, other

    math.OC cs.LG

    Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Stochastic gradient descent (SGD) optimization methods such as the plain vanilla SGD method and the popular Adam optimizer are nowadays the method of choice in the training of artificial neural networks (ANNs). Despite the remarkable success of SGD methods in the ANN training in numerical simulations, it remains in essentially all practical relevant scenarios an open problem to rigorously explain… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 36 pages

  17. arXiv:2310.20360  [pdf, other

    cs.LG cs.AI math.NA math.PR stat.ML

    Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

    Authors: Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

    Abstract: This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorit… ▽ More

    Submitted 25 February, 2025; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 712 pages, 36 figures, 45 source codes, 87 exercises. In v2, the material on optimization algorithms/methods has been significantly expanded

    MSC Class: 68T07

  18. arXiv:2309.13722  [pdf, ps, other

    math.NA cs.LG math.PR

    Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense

    Authors: Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, Joshua Lee Padgett

    Abstract: Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computa… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 52 pages

    MSC Class: 65M15; 65C05; 68T07 (Primary) 60H35 (Secondary)

  19. arXiv:2303.03390  [pdf, ps, other

    math.OC math.PR

    Nonlinear Monte Carlo methods with polynomial runtime for Bellman equations of discrete time high-dimensional stochastic optimal control problems

    Authors: Christian Beck, Arnulf Jentzen, Konrad Kleinberg, Thomas Kruse

    Abstract: Discrete time stochastic optimal control problems and Markov decision processes (MDPs), respectively, serve as fundamental models for problems that involve sequential decision making under uncertainty and as such constitute the theoretical foundation of reinforcement learning. In this article we study the numerical approximation of MDPs with infinite time horizon, finite control set, and general s… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    MSC Class: 90C40; 90C39; 60J05; 93E20; 65C05

  20. arXiv:2302.14690  [pdf, other

    math.OC cs.LG math.NA stat.ML

    On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

    Authors: Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

    Abstract: In this article, we show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation. Our work contrasts earlier results in [D. Gallon, A. Jentzen, and F. Lindner, preprint, arXiv:2211.15641, 2022] and [P. Petersen, M. Raslan, and F. Voigtlaender, Found. Comput. Math., 21 (2021), pp. 375-4… ▽ More

    Submitted 19 November, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: Author's Accepted Manuscript version. To appear in SINUM

    MSC Class: Primary 68T07; Secondary 68T05; 41A50

  21. arXiv:2302.03286  [pdf, other

    math.NA stat.ML

    Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

    Authors: Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

    Abstract: In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the pro… ▽ More

    Submitted 29 May, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 39 pages, 16 Figures

  22. arXiv:2301.08284  [pdf, ps, other

    math.NA cs.AI

    The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

    Authors: Lukas Gonon, Robin Graeber, Arnulf Jentzen

    Abstract: In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ a… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: 101 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2112.14523

    MSC Class: 65D40; 68T07

  23. arXiv:2212.13111  [pdf, other

    math.OC

    Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non-global local minima with high probability

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong nowadays to the most heavily employed computational schemes in the digital world. Despite the compelling success of such methods, it remains an open problem to provide a rigorous theoretical justification for the success of GD methods in the training of ANNs. The main difficulty is that the optimization risk… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Comments: 98 pages, 15 figures, 10 Python codes

    MSC Class: 65K10; 65C50; 68T05; 60H35

  24. arXiv:2211.15641  [pdf, ps, other

    math.OC

    Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

    Authors: Davide Gallon, Arnulf Jentzen, Felix Lindner

    Abstract: In this article we investigate blow up phenomena for gradient descent optimization methods in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on shallow ANNs with one neuron on the input layer, one neuron on the output layer, and one hidden layer. For ANNs with ReLU activation and at least two neurons on the hidden layer we establish the existence of a target… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: 84 pages, one figure

  25. arXiv:2210.13530  [pdf, other

    math.NA math.PR stat.CO

    An efficient Monte Carlo scheme for Zakai equations

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this paper we develop a numerical method for efficiently approximating solutions of certain Zakai equations in high dimensions. The key idea is to transform a given Zakai SPDE into a PDE with random coefficients. We show that under suitable regularity assumptions on the coefficients of the Zakai equation, the corresponding random PDE admits a solution random field which, for almost all realizat… ▽ More

    Submitted 20 August, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    MSC Class: 65C05; 65M75; 60H15; 62M20

  26. arXiv:2208.02083  [pdf, ps, other

    cs.LG math.DS math.OC

    Gradient descent provably escapes saddle points in the training of shallow ReLU networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we… ▽ More

    Submitted 11 September, 2024; v1 submitted 3 August, 2022; originally announced August 2022.

    MSC Class: 68T07; 37D10 ACM Class: I.2.6; G.1.6

    Journal ref: J Optim Theory Appl (2024)

  27. arXiv:2207.06246  [pdf, ps, other

    math.OC cs.LG

    Normalized gradient flow optimization in the training of ReLU artificial neural networks

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

    Abstract: The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular ch… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 26 pages, 1 figure

  28. arXiv:2206.13646  [pdf, ps, other

    cs.LG math.OC

    On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

    Authors: Arnulf Jentzen, Timo Kröger

    Abstract: It is an elementary fact in the scientific literature that the Lipschitz norm of the realization function of a feedforward fully-connected rectified linear unit (ReLU) artificial neural network (ANN) can, up to a multiplicative constant, be bounded from above by sums of powers of the norm of the ANN parameter vector. Roughly speaking, in this work we reveal in the case of shallow ANNs that the con… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 39 pages, 1 figure

  29. arXiv:2205.03672  [pdf, other

    math.NA cs.LG math.PR

    Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

    Authors: Victor Boussange, Sebastian Becker, Arnulf Jentzen, Benno Kuckuck, Loïc Pellissier

    Abstract: Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture these phenomena non-local nonlinear PDE models are fr… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: 59 pages

    MSC Class: 35R09 (Primary) 65M75; 45K05; 35K20; 65C05; 65M22; 68T07 (Secondary)

  30. arXiv:2202.11481  [pdf, other

    math.OC

    On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

    Authors: Shokhrukh Ibragimov, Arnulf Jentzen, Timo Kröger, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 49 pages, 1 figure

    MSC Class: 68T07

  31. arXiv:2202.02717  [pdf, other

    math.NA math.AP math.PR

    Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing

    Authors: Sebastian Becker, Arnulf Jentzen, Marvin S. Müller, Philippe von Wurstemberger

    Abstract: In financial engineering, prices of financial products are computed approximately many times each trading day with (slightly) different parameters in each calculation. In many financial models such prices can be approximated by means of Monte Carlo (MC) simulations. To obtain a good approximation the MC sample size usually needs to be considerably large resulting in a long computing time to obtain… ▽ More

    Submitted 8 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 71 pages, 4 Figures, 14 Tables; to appear in Math. Finance

    MSC Class: 35K15; 65C05; 65M75; 68T99; 91G20

  32. arXiv:2112.14523  [pdf, ps, other

    math.NA

    Deep neural network approximation theory for high-dimensional functions

    Authors: Pierfrancesco Beneventano, Patrick Cheridito, Robin Graeber, Arnulf Jentzen, Benno Kuckuck

    Abstract: The purpose of this article is to develop machinery to study the capacity of deep neural networks (DNNs) to approximate high-dimensional functions. In particular, we show that DNNs have the expressive power to overcome the curse of dimensionality in the approximation of a large class of functions. More precisely, we prove that these functions can be approximated by DNNs on compact sets such that t… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 82 pages, 1 figure

  33. arXiv:2112.09684  [pdf, other

    math.OC cs.LG math.NA math.ST

    On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learnin… ▽ More

    Submitted 13 July, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 89 pages, 15 figures

    Journal ref: Journal of Machine Learning, 1 (2022), pp. 141-246

  34. arXiv:2112.07369  [pdf, other

    cs.LG math.NA math.PR

    Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

    Abstract: In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimiz… ▽ More

    Submitted 22 June, 2023; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: 71 pages, 5 figures, 2 tables, 4 Python source codes. To appear in Electronic Research Archive

  35. arXiv:2110.08297  [pdf, ps, other

    math.NA math.PR

    Strong $L^p$-error analysis of nonlinear Monte Carlo approximations for high-dimensional semilinear partial differential equations

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck, Joshua Lee Padgett

    Abstract: Full-history recursive multilevel Picard (MLP) approximation schemes have been shown to overcome the curse of dimensionality in the numerical approximation of high-dimensional semilinear partial differential equations (PDEs) with general time horizons and Lipschitz continuous nonlinearities. However, each of the error analyses for MLP approximation schemes in the existing literature studies the… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 42 pages.

  36. arXiv:2108.10602  [pdf, ps, other

    math.NA math.PR

    Overcoming the curse of dimensionality in the numerical approximation of backward stochastic differential equations

    Authors: Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen

    Abstract: Backward stochastic differential equations (BSDEs) belong nowadays to the most frequently studied equations in stochastic analysis and computational stochastics. BSDEs in applications are often nonlinear and high-dimensional. In nearly all cases such nonlinear high-dimensional BSDEs cannot be solved explicitly and it has been and still is a very active topic of research to design and analyze numer… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  37. arXiv:2108.08106  [pdf, other

    cs.LG math.DS math.NA

    Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

    Abstract: The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. Till this day in the scientific literature there is in general no mathematical convergence analysis which explains the numerical success of GD type optimization schemes in the training of ANNs with R… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: 30 pages. arXiv admin note: text overlap with arXiv:2107.04479, arXiv:2108.04620

    Journal ref: Electronic Research Archive 2023, Volume 31, Issue 5: 2519-2554

  38. arXiv:2108.04620  [pdf, other

    math.OC cs.LG math.NA

    A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization methods are the standard instrument to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initi… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2107.04479

    Journal ref: Journal of Machine Learning Research 23, 260 (2022), pp. 1-50

  39. arXiv:2107.04479  [pdf, ps, other

    cs.LG math.DS math.NA

    Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: 37 pages

    Journal ref: Journal of Mathematical Analysis and Applications 517, 2 (2023)

  40. arXiv:2104.00277  [pdf, ps, other

    math.NA cs.LG math.PR math.ST

    A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural network… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 29 pages

    Journal ref: Zeitschrift für angewandte Mathematik und Physik 73 (2022)

  41. Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreov… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: J Nonlinear Sci 32, 64 (2022)

  42. arXiv:2103.04488  [pdf, ps, other

    math.NA

    Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

    Authors: Philipp Grohs, Shokhrukh Ibragimov, Arnulf Jentzen, Sarah Koppensteiner

    Abstract: Artificial neural networks (ANNs) have become a very powerful tool in the approximation of high-dimensional functions. Especially, deep ANNs, consisting of a large number of hidden layers, have been very successfully used in a series of practical relevant computational problems involving high-dimensional input data ranging from classification tasks in supervised learning to optimal decision proble… ▽ More

    Submitted 7 March, 2021; originally announced March 2021.

    Comments: 53 pages

  43. arXiv:2103.02350  [pdf, ps, other

    math.NA math.PR

    Full history recursive multilevel Picard approximations for ordinary differential equations with expectations

    Authors: Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Emilia Magnani

    Abstract: We consider ordinary differential equations (ODEs) which involve expectations of a random variable. These ODEs are special cases of McKean-Vlasov stochastic differential equations (SDEs). A plain vanilla Monte Carlo approximation method for such ODEs requires a computational cost of order $\varepsilon^{-3}$ to achieve a root-mean-square error of size $\varepsilon$. In this work we adapt recently i… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 24 pages. arXiv admin note: substantial text overlap with arXiv:1903.05985

    MSC Class: 65Lxx; 65Mxx; 65Cxx; 65M75 ACM Class: G.1.0; G.1.7; G.1.m; G.3

  44. arXiv:2102.11840  [pdf, ps, other

    cs.LG math.NA math.PR

    Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

    Authors: Arnulf Jentzen, Timo Kröger

    Abstract: In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits. However, it is still unclear why randomly initialized gradient descent optimization algorithms, such as the well-known batch gradient descent, are able to achieve zero training loss in many situations even though the ob… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: 38 pages

  45. arXiv:2102.09924  [pdf, ps, other

    math.NA cs.LG math.ST

    A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

    Abstract: Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 23 pages

    Journal ref: Journal of Complexity (2022)

  46. An overview on deep learning-based approximation methods for partial differential equations

    Authors: Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck

    Abstract: It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs). Recently, several deep learning-based approximation algorithms for attacking this problem have been proposed and tested numerically on a number of examples of high-dimensional PDEs. This has given rise to a lively field of research in which deep learnin… ▽ More

    Submitted 18 November, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: 49 pages. Compared to the first version, the manuscript has been significantly expanded. In particular, Python source code implementing several of the presented methods using PyTorch, as well as numerical simulations have been added

    MSC Class: 65M99 (Primary); 35-02; 65-02; 68T07 (Secondary)

    Journal ref: Discrete Contin. Dyn. Syst. Ser. B 28 (2023), no. 6, 3697-3746

  47. arXiv:2012.08443  [pdf, ps, other

    cs.LG math.NA math.ST

    Strong overall error analysis for the training of artificial neural networks via random initializations

    Authors: Arnulf Jentzen, Adrian Riekert

    Abstract: Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 40 pages

    Journal ref: Communications in Mathematics and Statistics (2023)

  48. arXiv:2012.04326  [pdf, other

    math.NA

    High-dimensional approximation spaces of artificial neural networks and applications to partial differential equations

    Authors: Pierfrancesco Beneventano, Patrick Cheridito, Arnulf Jentzen, Philippe von Wurstemberger

    Abstract: In this paper we develop a new machinery to study the capacity of artificial neural networks (ANNs) to approximate high-dimensional functions without suffering from the curse of dimensionality. Specifically, we introduce a concept which we refer to as approximation spaces of artificial neural networks and we present several tools to handle those spaces. Roughly speaking, approximation spaces consi… ▽ More

    Submitted 28 January, 2025; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 31 pages

  49. arXiv:2012.01194  [pdf, ps, other

    math.NA cs.LG math.PR stat.ML

    Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. We test the performance of the proposed app… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  50. arXiv:2009.13989  [pdf, ps, other

    math.PR cs.CC math.NA

    Nonlinear Monte Carlo methods with polynomial runtime for high-dimensional iterated nested expectations

    Authors: Christian Beck, Arnulf Jentzen, Thomas Kruse

    Abstract: The approximative calculation of iterated nested expectations is a recurring challenging problem in applications. Nested expectations appear, for example, in the numerical approximation of solutions of backward stochastic differential equations (BSDEs), in the numerical approximation of solutions of semilinear parabolic partial differential equations (PDEs), in statistical physics, in optimal stop… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: 47 pages

    MSC Class: 65C05 (Primary) 65M75; 68Q25 (Secondary)