Skip to main content

Showing 1–10 of 10 results for author: Cheridito, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.22364  [pdf, ps, other

    stat.ML cs.LG

    Computing Optimal Transport Maps and Wasserstein Barycenters Using Conditional Normalizing Flows

    Authors: Gabriele Visentin, Patrick Cheridito

    Abstract: We present a novel method for efficiently computing optimal transport maps and Wasserstein barycenters in high-dimensional spaces. Our approach uses conditional normalizing flows to approximate the input distributions as invertible pushforward transformations from a common latent space. This makes it possible to directly solve the primal problem using gradient-based minimization of the transport c… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    MSC Class: 65K99 (Primary) 68T07; 68T99 (Secondary)

  2. arXiv:2505.15602  [pdf, ps, other

    cs.LG eess.SY math.OC q-fin.PM

    Deep Learning for Continuous-time Stochastic Control with Jumps

    Authors: Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut

    Abstract: In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    ACM Class: I.2.8; I.2.6

  3. arXiv:2208.02083  [pdf, ps, other

    cs.LG math.DS math.OC

    Gradient descent provably escapes saddle points in the training of shallow ReLU networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we… ▽ More

    Submitted 11 September, 2024; v1 submitted 3 August, 2022; originally announced August 2022.

    MSC Class: 68T07; 37D10 ACM Class: I.2.6; G.1.6

    Journal ref: J Optim Theory Appl (2024)

  4. arXiv:2112.01804  [pdf, ps, other

    stat.CO cs.LG math.NA math.ST

    Computation of conditional expectations with guarantees

    Authors: Patrick Cheridito, Balint Gersey

    Abstract: Theoretically, the conditional expectation of a square-integrable random variable $Y$ given a $d$-dimensional random vector $X$ can be obtained by minimizing the mean squared distance between $Y$ and $f(X)$ over all Borel measurable functions $f \colon \mathbb{R}^d \to \mathbb{R}$. However, in many applications this minimization problem cannot be solved exactly, and instead, a numerical method whi… ▽ More

    Submitted 21 February, 2023; v1 submitted 3 December, 2021; originally announced December 2021.

    MSC Class: 62J02; 65G99; 65C05; 65C20; 68T05

  5. Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreov… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

    MSC Class: 68T07 ACM Class: I.2.6

    Journal ref: J Nonlinear Sci 32, 64 (2022)

  6. arXiv:2102.09924  [pdf, ps, other

    math.NA cs.LG math.ST

    A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

    Authors: Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

    Abstract: Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 23 pages

    Journal ref: Journal of Complexity (2022)

  7. arXiv:2012.01194  [pdf, ps, other

    math.NA cs.LG math.PR stat.ML

    Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs). In the proposed approximation algorithm we employ a deep neural network for every realization of the driving noise process of the SPDE to approximate the solution process of the SPDE under consideration. We test the performance of the proposed app… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  8. arXiv:2006.07075  [pdf, ps, other

    cs.LG math.NA stat.ML

    Non-convergence of stochastic gradient descent in the training of deep neural networks

    Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

    Abstract: Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradien… ▽ More

    Submitted 29 January, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    MSC Class: 68T07 (Primary) 65D15 (Secondary) ACM Class: I.2.6

    Journal ref: J. Complexity 64 (2021)

  9. arXiv:1908.01602  [pdf, other

    cs.CE cs.LG math.PR q-fin.CP

    Solving high-dimensional optimal stopping problems using deep learning

    Authors: Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Timo Welti

    Abstract: Nowadays many financial derivatives, such as American or Bermudan options, are of early exercise type. Often the pricing of early exercise options gives rise to high-dimensional optimal stopping problems, since the dimension corresponds to the number of underlying assets. High-dimensional optimal stopping problems are, however, notoriously difficult to solve due to the well-known curse of dimensio… ▽ More

    Submitted 8 August, 2021; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: 54 pages, 1 figure

    MSC Class: 68T07; 60G40; 65C05; 91G60

    Journal ref: Eur. J. Appl. Math 32 (2021) 470-514

  10. arXiv:1907.03452  [pdf, ps, other

    math.NA cs.LG math.PR stat.ML

    Deep splitting method for parabolic PDEs

    Authors: Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

    Abstract: In this paper we introduce a numerical method for nonlinear parabolic PDEs that combines operator splitting with deep learning. It divides the PDE approximation problem into a sequence of separate learning problems. Since the computational graph for each of the subproblems is comparatively small, the approach can handle extremely high-dimensional PDEs. We test the method on different examples from… ▽ More

    Submitted 21 June, 2021; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: 25 pages

    MSC Class: 35K15; 65C05; 65M22; 65M75; 91G20; 93E20

    Journal ref: SIAM J. Sci. Comput. 43 (2021), no. 5, A3135-A3154