-
Random Feature Representation Boosting
Authors:
Nikita Zozoulenko,
Thomas Cass,
Lukas Gonon
Abstract:
We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits of RFNNs. In the case of MSE loss, we obtain clo…
▽ More
We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional gradient of the network representation, enhancing performance while preserving the convex optimization benefits of RFNNs. In the case of MSE loss, we obtain closed-form solutions to greedy layer-wise boosting with random features. For general loss functions, we show that fitting random feature residual blocks reduces to solving a quadratically constrained least squares problem. Through extensive numerical experiments on tabular datasets for both regression and classification, we show that RFRBoost significantly outperforms RFNNs and end-to-end trained MLP ResNets in the small- to medium-scale regime where RFNNs are typically applied. Moreover, RFRBoost offers substantial computational benefits, and theoretical guarantees stemming from boosting theory.
△ Less
Submitted 28 May, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Fast Deep Hedging with Second-Order Optimization
Authors:
Konrad Mueller,
Amira Akkari,
Lukas Gonon,
Ben Wood
Abstract:
Hedging exotic options in presence of market frictions is an important risk management task. Deep hedging can solve such hedging problems by training neural network policies in realistic simulated markets. Training these neural networks may be delicate and suffer from slow convergence, particularly for options with long maturities and complex sensitivities to market parameters. To address this, we…
▽ More
Hedging exotic options in presence of market frictions is an important risk management task. Deep hedging can solve such hedging problems by training neural network policies in realistic simulated markets. Training these neural networks may be delicate and suffer from slow convergence, particularly for options with long maturities and complex sensitivities to market parameters. To address this, we propose a second-order optimization scheme for deep hedging. We leverage pathwise differentiability to construct a curvature matrix, which we approximate as block-diagonal and Kronecker-factored to efficiently precondition gradients. We evaluate our method on a challenging and practically important problem: hedging a cliquet option on a stock with stochastic volatility by trading in the spot and vanilla options. We find that our second-order scheme can optimize the policy in 1/4 of the number of steps that standard adaptive moment-based optimization takes.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Computing Systemic Risk Measures with Graph Neural Networks
Authors:
Lukas Gonon,
Thilo Meyer-Brandis,
Niklas Weber
Abstract:
This paper investigates systemic risk measures for stochastic financial networks of explicitly modelled bilateral liabilities. We extend the notion of systemic risk measures from Biagini, Fouque, Fritelli and Meyer-Brandis (2019) to graph structured data. In particular, we focus on an aggregation function that is derived from a market clearing algorithm proposed by Eisenberg and Noe (2001). In thi…
▽ More
This paper investigates systemic risk measures for stochastic financial networks of explicitly modelled bilateral liabilities. We extend the notion of systemic risk measures from Biagini, Fouque, Fritelli and Meyer-Brandis (2019) to graph structured data. In particular, we focus on an aggregation function that is derived from a market clearing algorithm proposed by Eisenberg and Noe (2001). In this setting, we show the existence of an optimal random allocation that distributes the overall minimal bailout capital and secures the network. We study numerical methods for the approximation of systemic risk and optimal random allocations. We propose to use permutation equivariant architectures of neural networks like graph neural networks (GNNs) and a class that we name (extended) permutation equivariant neural networks ((X)PENNs). We compare their performance to several benchmark allocations. The main feature of GNNs and (X)PENNs is that they are permutation equivariant with respect to the underlying graph data. In numerical experiments we find evidence that these permutation equivariant methods are superior to other approaches.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
An Overview on Machine Learning Methods for Partial Differential Equations: from Physics Informed Neural Networks to Deep Operator Learning
Authors:
Lukas Gonon,
Arnulf Jentzen,
Benno Kuckuck,
Siyu Liang,
Adrian Riekert,
Philippe von Wurstemberger
Abstract:
The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of art…
▽ More
The approximation of solutions of partial differential equations (PDEs) with numerical algorithms is a central topic in applied mathematics. For many decades, various types of methods for this purpose have been developed and extensively studied. One class of methods which has received a lot of attention in recent years are machine learning-based methods, which typically involve the training of artificial neural networks (ANNs) by means of stochastic gradient descent type optimization methods. While approximation methods for PDEs using ANNs have first been proposed in the 1990s they have only gained wide popularity in the last decade with the rise of deep learning. This article aims to provide an introduction to some of these methods and the mathematical theory on which they are based. We discuss methods such as physics-informed neural networks (PINNs) and deep BSDE methods and consider several operator learning approaches.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Infinite-dimensional Mahalanobis Distance with Applications to Kernelized Novelty Detection
Authors:
Nikita Zozoulenko,
Thomas Cass,
Lukas Gonon
Abstract:
The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance…
▽ More
The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical $\bbR^d$, functional $(L^2[0,1])^d$, and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator and establish consistency and convergence results for estimation using empirical measures. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.
△ Less
Submitted 28 May, 2025; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Operator Deep Smoothing for Implied Volatility
Authors:
Lukas Gonon,
Antoine Jacquier,
Ruben Wiedemann
Abstract:
We devise a novel method for nowcasting implied volatility based on neural operators. Better known as implied volatility smoothing in the financial industry, nowcasting of implied volatility means constructing a smooth surface that is consistent with the prices presently observed on a given option market. Option price data arises highly dynamically in ever-changing spatial configurations, which po…
▽ More
We devise a novel method for nowcasting implied volatility based on neural operators. Better known as implied volatility smoothing in the financial industry, nowcasting of implied volatility means constructing a smooth surface that is consistent with the prices presently observed on a given option market. Option price data arises highly dynamically in ever-changing spatial configurations, which poses a major limitation to foundational machine learning approaches using classical neural networks. While large models in language and image processing deliver breakthrough results on vast corpora of raw data, in financial engineering the generalization from big historical datasets has been hindered by the need for considerable data pre-processing. In particular, implied volatility smoothing has remained an instance-by-instance, hands-on process both for neural network-based and traditional parametric strategies. Our general operator deep smoothing approach, instead, directly maps observed data to smoothed surfaces. We adapt the graph neural operator architecture to do so with high accuracy on ten years of raw intraday S&P 500 options data, using a single model instance. The trained operator adheres to critical no-arbitrage constraints and is robust with respect to subsampling of inputs (occurring in practice in the context of outlier removal). We provide extensive historical benchmarks and showcase the generalization capability of our approach in a comparison with classical neural networks and SVI, an industry standard parametrization for implied volatility. The operator deep smoothing approach thus opens up the use of neural networks on large historical datasets in financial engineering.
△ Less
Submitted 9 October, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Universal randomised signatures for generative time series modelling
Authors:
Francesca Biagini,
Lukas Gonon,
Niklas Walter
Abstract:
Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on t…
▽ More
Randomised signature has been proposed as a flexible and easily implementable alternative to the well-established path signature. In this article, we employ randomised signature to introduce a generative model for financial time series data in the spirit of reservoir computing. Specifically, we propose a novel Wasserstein-type distance based on discrete-time randomised signatures. This metric on the space of probability measures captures the distance between (conditional) distributions. Its use is justified by our novel universal approximation results for randomised signatures on the space of continuous functions taking the underlying path as an input. We then use our metric as the loss function in a non-adversarial generator model for synthetic time series data based on a reservoir neural stochastic differential equation. We compare the results of our model to benchmarks from the existing literature.
△ Less
Submitted 6 September, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Approximation Rates for Deep Calibration of (Rough) Stochastic Volatility Models
Authors:
Francesca Biagini,
Lukas Gonon,
Niklas Walter
Abstract:
We derive quantitative error bounds for deep neural networks (DNNs) approximating option prices on a $d$-dimensional risky asset as functions of the underlying model parameters, payoff parameters and initial conditions. We cover a general class of stochastic volatility models of Markovian nature as well as the rough Bergomi model. In particular, under suitable assumptions we show that option price…
▽ More
We derive quantitative error bounds for deep neural networks (DNNs) approximating option prices on a $d$-dimensional risky asset as functions of the underlying model parameters, payoff parameters and initial conditions. We cover a general class of stochastic volatility models of Markovian nature as well as the rough Bergomi model. In particular, under suitable assumptions we show that option prices can be learned by DNNs up to an arbitrary small error $\varepsilon \in (0,1/2)$ while the network size grows only sub-polynomially in the asset vector dimension $d$ and the reciprocal $\varepsilon^{-1}$ of the accuracy. Hence, the approximation does not suffer from the curse of dimensionality. As quantitative approximation results for DNNs applicable in our setting are formulated for functions on compact domains, we first consider the case of the asset price restricted to a compact set, then we extend these results to the general case by using convergence arguments for the option prices.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Universal Approximation Theorem and error bounds for quantum neural networks and quantum reservoirs
Authors:
Lukas Gonon,
Antoine Jacquier
Abstract:
Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterised quantum circuits. We provide here precise error bounds for specific classes of fun…
▽ More
Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterised quantum circuits. We provide here precise error bounds for specific classes of functions and extend these results to the interesting new setup of randomised quantum circuits, mimicking classical reservoir neural networks. Our results show in particular that a quantum neural network with $\mathcal{O}(\varepsilon^{-2})$ weights and $\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)$ qubits suffices to achieve accuracy $\varepsilon>0$ when approximating functions with integrable Fourier transform.
△ Less
Submitted 3 February, 2025; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Infinite-dimensional reservoir computing
Authors:
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega
Abstract:
Reservoir computing approximation and generalization bounds are proved for a new concept class of input/output systems that extends the so-called generalized Barron functionals to a dynamic context. This new class is characterized by the readouts with a certain integral representation built on infinite-dimensional state-space systems. It is shown that this class is very rich and possesses useful f…
▽ More
Reservoir computing approximation and generalization bounds are proved for a new concept class of input/output systems that extends the so-called generalized Barron functionals to a dynamic context. This new class is characterized by the readouts with a certain integral representation built on infinite-dimensional state-space systems. It is shown that this class is very rich and possesses useful features and universal approximation properties. The reservoir architectures used for the approximation and estimation of elements in the new class are randomly generated echo state networks with either linear or ReLU activation functions. Their readouts are built using randomly generated neural networks in which only the output layer is trained (extreme learning machines or random feature neural networks). The results in the paper yield a fully implementable recurrent neural network-based learning algorithm with provable convergence guarantees that do not suffer from the curse of dimensionality.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality
Authors:
Lukas Gonon,
Robin Graeber,
Arnulf Jentzen
Abstract:
In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ a…
▽ More
In this article we study high-dimensional approximation capacities of shallow and deep artificial neural networks (ANNs) with the rectified linear unit (ReLU) activation. In particular, it is a key contribution of this work to reveal that for all $a,b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a,b]^d\ni x=(x_1,\dots,x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ as well as the functions $[a,b]^d\ni x =(x_1,\dots, x_d)\mapsto\sin(\prod_{i=1}^d x_i) \in \mathbb{R} $ for $ d \in \mathbb{N} $ can neither be approximated without the curse of dimensionality by means of shallow ANNs nor insufficiently deep ANNs with ReLU activation but can be approximated without the curse of dimensionality by sufficiently deep ANNs with ReLU activation. We show that the product functions and the sine of the product functions are polynomially tractable approximation problems among the approximating class of deep ReLU ANNs with the number of hidden layers being allowed to grow in the dimension $ d \in \mathbb{N} $. We establish the above outlined statements not only for the product functions and the sine of the product functions but also for other classes of target functions, in particular, for classes of uniformly globally bounded $ C^{ \infty } $-functions with compact support on any $[a,b]^d$ with $a\in\mathbb{R}$, $b\in(a,\infty)$. Roughly speaking, in this work we lay open that simple approximation problems such as approximating the sine or cosine of products cannot be solved in standard implementation frameworks by shallow or insufficiently deep ANNs with ReLU activation in polynomial time, but can be approximated by sufficiently deep ReLU ANNs with the number of parameters growing at most polynomially.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Reservoir kernels and Volterra series
Authors:
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega
Abstract:
A universal kernel is constructed whose sections approximate any causal and time-invariant filter in the fading memory category with inputs and outputs in a finite-dimensional Euclidean space. This kernel is built using the reservoir functional associated with a state-space representation of the Volterra series expansion available for any analytic fading memory filter. It is hence called the Volte…
▽ More
A universal kernel is constructed whose sections approximate any causal and time-invariant filter in the fading memory category with inputs and outputs in a finite-dimensional Euclidean space. This kernel is built using the reservoir functional associated with a state-space representation of the Volterra series expansion available for any analytic fading memory filter. It is hence called the Volterra reservoir kernel. Even though the state-space representation and the corresponding reservoir feature map are defined on an infinite-dimensional tensor algebra space, the kernel map is characterized by explicit recursions that are readily computable for specific data sets when employed in estimation problems using the representer theorem. We showcase the performance of the Volterra reservoir kernel in a popular data science application in relation to bitcoin price prediction.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Deep neural network expressivity for optimal stopping problems
Authors:
Lukas Gonon
Abstract:
This article studies deep neural network expression rates for optimal stopping problems of discrete-time Markov processes on high-dimensional state spaces. A general framework is established in which the value function and continuation value of an optimal stopping problem can be approximated with error at most $\varepsilon$ by a deep ReLU neural network of size at most…
▽ More
This article studies deep neural network expression rates for optimal stopping problems of discrete-time Markov processes on high-dimensional state spaces. A general framework is established in which the value function and continuation value of an optimal stopping problem can be approximated with error at most $\varepsilon$ by a deep ReLU neural network of size at most $κd^{\mathfrak{q}} \varepsilon^{-\mathfrak{r}}$. The constants $κ,\mathfrak{q},\mathfrak{r} \geq 0$ do not depend on the dimension $d$ of the state space or the approximation accuracy $\varepsilon$. This proves that deep neural networks do not suffer from the curse of dimensionality when employed to solve optimal stopping problems. The framework covers, for example, exponential Lévy models, discrete diffusion processes and their running minima and maxima. These results mathematically justify the use of deep neural networks for numerically solving optimal stopping problems and pricing American options in high dimensions.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Detecting asset price bubbles using deep learning
Authors:
Francesca Biagini,
Lukas Gonon,
Andrea Mazzon,
Thilo Meyer-Brandis
Abstract:
In this paper we employ deep learning techniques to detect financial asset bubbles by using observed call option prices. The proposed algorithm is widely applicable and model-independent. We test the accuracy of our methodology in numerical experiments within a wide range of models and apply it to market data of tech stocks in order to assess if asset price bubbles are present. Under a given condi…
▽ More
In this paper we employ deep learning techniques to detect financial asset bubbles by using observed call option prices. The proposed algorithm is widely applicable and model-independent. We test the accuracy of our methodology in numerical experiments within a wide range of models and apply it to market data of tech stocks in order to assess if asset price bubbles are present. Under a given condition on the pricing of call options under asset price bubbles, we are able to provide a theoretical foundation of our approach for positive and continuous stochastic asset price processes. When such a condition is not satisfied, we focus on local volatility models. To this purpose, we give a new necessary and sufficient condition for a process with time-dependent local volatility function to be a strict local martingale.
△ Less
Submitted 19 June, 2024; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Neural network approximation for superhedging prices
Authors:
Francesca Biagini,
Lukas Gonon,
Thomas Reitsam
Abstract:
This article examines neural network-based approximations for the superhedging price process of a contingent claim in a discrete time market model. First we prove that the $α$-quantile hedging price converges to the superhedging price at time $0$ for $α$ tending to $1$, and show that the $α$-quantile hedging price can be approximated by a neural network-based price. This provides a neural network-…
▽ More
This article examines neural network-based approximations for the superhedging price process of a contingent claim in a discrete time market model. First we prove that the $α$-quantile hedging price converges to the superhedging price at time $0$ for $α$ tending to $1$, and show that the $α$-quantile hedging price can be approximated by a neural network-based price. This provides a neural network-based approximation for the superhedging price at time $0$ and also the superhedging strategy up to maturity. To obtain the superhedging price process for $t>0$, by using the Doob decomposition it is sufficient to determine the process of consumption. We show that it can be approximated by the essential supremum over a set of neural networks. Finally, we present numerical results.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Random feature neural networks learn Black-Scholes type PDEs without curse of dimensionality
Authors:
Lukas Gonon
Abstract:
This article investigates the use of random feature neural networks for learning Kolmogorov partial (integro-)differential equations associated to Black-Scholes and more general exponential Lévy models. Random feature neural networks are single-hidden-layer feedforward neural networks in which only the output weights are trainable. This makes training particularly simple, but (a priori) reduces ex…
▽ More
This article investigates the use of random feature neural networks for learning Kolmogorov partial (integro-)differential equations associated to Black-Scholes and more general exponential Lévy models. Random feature neural networks are single-hidden-layer feedforward neural networks in which only the output weights are trainable. This makes training particularly simple, but (a priori) reduces expressivity. Interestingly, this is not the case for Black-Scholes type PDEs, as we show here. We derive bounds for the prediction error of random neural networks for learning sufficiently non-degenerate Black-Scholes type models. A full error analysis is provided and it is shown that the derived bounds do not suffer from the curse of dimensionality. We also investigate an application of these results to basket options and validate the bounds numerically.
These results prove that neural networks are able to \textit{learn} solutions to Black-Scholes type PDEs without the curse of dimensionality. In addition, this provides an example of a relevant learning problem in which random feature neural networks are provably efficient.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Deep ReLU neural networks overcome the curse of dimensionality for partial integrodifferential equations
Authors:
Lukas Gonon,
Christoph Schwab
Abstract:
Deep neural networks (DNNs) with ReLU activation function are proved to be able to express viscosity solutions of linear partial integrodifferental equations (PIDEs) on state spaces of possibly high dimension $d$. Admissible PIDEs comprise Kolmogorov equations for high-dimensional diffusion, advection, and for pure jump Lévy processes. We prove for such PIDEs arising from a class of jump-diffusion…
▽ More
Deep neural networks (DNNs) with ReLU activation function are proved to be able to express viscosity solutions of linear partial integrodifferental equations (PIDEs) on state spaces of possibly high dimension $d$. Admissible PIDEs comprise Kolmogorov equations for high-dimensional diffusion, advection, and for pure jump Lévy processes. We prove for such PIDEs arising from a class of jump-diffusions on $\mathbb{R}^d$ that for any suitable measure $μ^d$ on $\mathbb{R}^d$ there exist constants $C,{\mathfrak{p}},{\mathfrak{q}}>0$ such that for every $\varepsilon \in (0,1]$ and for every $d\in \mathbb{N}$ the DNN $L^2(μ^d)$-expression error of viscosity solutions of the PIDE is of size $\varepsilon$ with DNN size bounded by $Cd^{\mathfrak{p}}\varepsilon^{-\mathfrak{q}}$.
In particular, the constant $C>0$ is independent of $d\in \mathbb{N}$ and of $\varepsilon \in (0,1]$ and depends only on the coefficients in the PIDE and the measure used to quantify the error. This establishes that ReLU DNNs can break the curse of dimensionality (CoD for short) for viscosity solutions of linear, possibly degenerate PIDEs corresponding to suitable Markovian jump-diffusion processes.
As a consequence of the employed techniques we also obtain that expectations of a large class of path-dependent functionals of the underlying jump-diffusion processes can be expressed without the CoD.
△ Less
Submitted 5 August, 2022; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Deep ReLU Network Expression Rates for Option Prices in high-dimensional, exponential Lévy models
Authors:
Lukas Gonon,
Christoph Schwab
Abstract:
We study the expression rates of deep neural networks (DNNs for short) for option prices written on baskets of $d$ risky assets, whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. We establish sufficient conditions on the characteristic triplet of the Lévy process $X$ that ensure $\varepsilon$ error of DNN expressed option prices with DNNs of…
▽ More
We study the expression rates of deep neural networks (DNNs for short) for option prices written on baskets of $d$ risky assets, whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. We establish sufficient conditions on the characteristic triplet of the Lévy process $X$ that ensure $\varepsilon$ error of DNN expressed option prices with DNNs of size that grows polynomially with respect to $\mathcal{O}(\varepsilon^{-1})$, and with constants implied in $\mathcal{O}(\cdot)$ which grow polynomially with respect $d$, thereby overcoming the curse of dimensionality and justifying the use of DNNs in financial modelling of large baskets in markets with jumps.
In addition, we exploit parabolic smoothing of Kolmogorov partial integrodifferential equations for certain multivariate Lévy processes to present alternative architectures of ReLU DNNs that provide $\varepsilon$ expression error in DNN size $\mathcal{O}(|\log(\varepsilon)|^a)$ with exponent $a \sim d$, however, with constants implied in $\mathcal{O}(\cdot)$ growing exponentially with respect to $d$. Under stronger, dimension-uniform non-degeneracy conditions on the Lévy symbol, we obtain algebraic expression rates of option prices in exponential Lévy models which are free from the curse of dimensionality. In this case the ReLU DNN expression rates of prices depend on certain sparsity conditions on the characteristic Lévy triplet. We indicate several consequences and possible extensions of the present results.
△ Less
Submitted 5 July, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
Discrete-time signatures and randomness in reservoir computing
Authors:
Christa Cuchiero,
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega,
Josef Teichmann
Abstract:
A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projecti…
▽ More
A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.
△ Less
Submitted 17 September, 2020;
originally announced October 2020.
-
Fading memory echo state networks are universal
Authors:
Lukas Gonon,
Juan-Pablo Ortega
Abstract:
Echo state networks (ESNs) have been recently proved to be universal approximants for input/output systems with respect to various $L ^p$-type criteria. When $1\leq p< \infty$, only $p$-integrability hypotheses need to be imposed, while in the case $p=\infty$ a uniform boundedness hypotheses on the inputs is required. This note shows that, in the last case, a universal family of ESNs can be constr…
▽ More
Echo state networks (ESNs) have been recently proved to be universal approximants for input/output systems with respect to various $L ^p$-type criteria. When $1\leq p< \infty$, only $p$-integrability hypotheses need to be imposed, while in the case $p=\infty$ a uniform boundedness hypotheses on the inputs is required. This note shows that, in the last case, a universal family of ESNs can be constructed that contains exclusively elements that have the echo state and the fading memory properties. This conclusion could not be drawn with the results and methods available so far in the literature.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Weak error analysis for stochastic gradient descent optimization algorithms
Authors:
Aritz Bercher,
Lukas Gonon,
Arnulf Jentzen,
Diyora Salimova
Abstract:
Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equa…
▽ More
Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equations. In mathematical convergence results for SGD type optimization schemes there are usually two types of error criteria studied in the scientific literature, that is, the error in the strong sense and the error with respect to the objective function. In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function. The analysis of the size of this error is the subject of this article. In particular, the main result of this article proves under suitable assumptions that the size of this error decays at the same speed as in the special case where the test function coincides with the objective function.
△ Less
Submitted 21 July, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Memory and forecasting capacities of nonlinear recurrent networks
Authors:
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega
Abstract:
The notion of memory capacity, originally introduced for echo state and linear networks with independent inputs, is generalized to nonlinear recurrent networks with stationary but dependent inputs. The presence of dependence in the inputs makes natural the introduction of the network forecasting capacity, that measures the possibility of forecasting time series values using network states. Generic…
▽ More
The notion of memory capacity, originally introduced for echo state and linear networks with independent inputs, is generalized to nonlinear recurrent networks with stationary but dependent inputs. The presence of dependence in the inputs makes natural the introduction of the network forecasting capacity, that measures the possibility of forecasting time series values using network states. Generic bounds for memory and forecasting capacities are formulated in terms of the number of neurons of the nonlinear recurrent network and the autocovariance function or the spectral density of the input. These bounds generalize well-known estimates in the literature to a dependent inputs setup. Finally, for the particular case of linear recurrent networks with independent inputs it is proved that the memory capacity is given by the rank of the associated controllability matrix, a fact that has been for a long time assumed to be true without proof by the community.
△ Less
Submitted 2 September, 2020; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations
Authors:
Christian Beck,
Lukas Gonon,
Arnulf Jentzen
Abstract:
Recently, so-called full-history recursive multilevel Picard (MLP) approximation schemes have been introduced and shown to overcome the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (PDEs) with Lipschitz nonlinearities. The key contribution of this article is to introduce and analyze a new variant of MLP approximation schemes for cert…
▽ More
Recently, so-called full-history recursive multilevel Picard (MLP) approximation schemes have been introduced and shown to overcome the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations (PDEs) with Lipschitz nonlinearities. The key contribution of this article is to introduce and analyze a new variant of MLP approximation schemes for certain semilinear elliptic PDEs with Lipschitz nonlinearities and to prove that the proposed approximation schemes overcome the curse of dimensionality in the numerical approximation of such semilinear elliptic PDEs.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Approximation Bounds for Random Neural Networks and Reservoir Systems
Authors:
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega
Abstract:
This work studies approximation based on single-hidden-layer feedforward and recurrent neural networks with randomly generated internal weights. These methods, in which only the last layer of weights and a few hyperparameters are optimized, have been successfully applied in a wide range of static and dynamic learning problems. Despite the popularity of this approach in empirical tasks, important t…
▽ More
This work studies approximation based on single-hidden-layer feedforward and recurrent neural networks with randomly generated internal weights. These methods, in which only the last layer of weights and a few hyperparameters are optimized, have been successfully applied in a wide range of static and dynamic learning problems. Despite the popularity of this approach in empirical tasks, important theoretical questions regarding the relation between the unknown function, the weight distribution, and the approximation rate have remained open. In this work it is proved that, as long as the unknown function, functional, or dynamical system is sufficiently regular, it is possible to draw the internal weights of the random (recurrent) neural network from a generic distribution (not depending on the unknown object) and quantify the error in terms of the number of neurons and the hyperparameters. In particular, this proves that echo state networks with randomly generated weights are capable of approximating a wide class of dynamical systems arbitrarily well and thus provides the first mathematical explanation for their empirically observed success at learning dynamical systems.
△ Less
Submitted 16 February, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Uniform error estimates for artificial neural network approximations for heat equations
Authors:
Lukas Gonon,
Philipp Grohs,
Arnulf Jentzen,
David Kofler,
David Šiška
Abstract:
Recently, artificial neural networks (ANNs) in conjunction with stochastic gradient descent optimization methods have been employed to approximately compute solutions of possibly rather high-dimensional partial differential equations (PDEs). Very recently, there have also been a number of rigorous mathematical results in the scientific literature which examine the approximation capabilities of suc…
▽ More
Recently, artificial neural networks (ANNs) in conjunction with stochastic gradient descent optimization methods have been employed to approximately compute solutions of possibly rather high-dimensional partial differential equations (PDEs). Very recently, there have also been a number of rigorous mathematical results in the scientific literature which examine the approximation capabilities of such deep learning based approximation algorithms for PDEs. These mathematical results from the scientific literature prove in part that algorithms based on ANNs are capable of overcoming the curse of dimensionality in the numerical approximation of high-dimensional PDEs. In these mathematical results from the scientific literature usually the error between the solution of the PDE and the approximating ANN is measured in the $L^p$-sense with respect to some $p \in [1,\infty)$ and some probability measure. In many applications it is, however, also important to control the error in a uniform $L^\infty$-sense. The key contribution of the main result of this article is to develop the techniques to obtain error estimates between solutions of PDEs and approximating ANNs in the uniform $L^\infty$-sense. In particular, we prove that the number of parameters of an ANN to uniformly approximate the classical solution of the heat equation in a region $ [a,b]^d $ for a fixed time point $ T \in (0,\infty) $ grows at most polynomially in the dimension $ d \in \mathbb{N} $ and the reciprocal of the approximation precision $ \varepsilon > 0 $. This shows that ANNs can overcome the curse of dimensionality in the numerical approximation of the heat equation when the error is measured in the uniform $L^\infty$-norm.
△ Less
Submitted 15 June, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Risk bounds for reservoir computing
Authors:
Lukas Gonon,
Lyudmila Grigoryeva,
Juan-Pablo Ortega
Abstract:
We analyze the practices of reservoir computing in the framework of statistical learning theory. In particular, we derive finite sample upper bounds for the generalization error committed by specific families of reservoir computing systems when processing discrete-time inputs under various hypotheses on their dependence structure. Non-asymptotic bounds are explicitly written down in terms of the m…
▽ More
We analyze the practices of reservoir computing in the framework of statistical learning theory. In particular, we derive finite sample upper bounds for the generalization error committed by specific families of reservoir computing systems when processing discrete-time inputs under various hypotheses on their dependence structure. Non-asymptotic bounds are explicitly written down in terms of the multivariate Rademacher complexities of the reservoir systems and the weak dependence structure of the signals that are being handled. This allows, in particular, to determine the minimal number of observations needed in order to guarantee a prescribed estimation accuracy with high probability for a given reservoir family. At the same time, the asymptotic behavior of the devised bounds guarantees the consistency of the empirical risk minimization procedure for various hypothesis classes of reservoir functionals.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
On existence and uniqueness properties for solutions of stochastic fixed point equations
Authors:
Christian Beck,
Lukas Gonon,
Martin Hutzenthaler,
Arnulf Jentzen
Abstract:
The Feynman-Kac formula implies that every suitable classical solution of a semilinear Kolmogorov partial differential equation (PDE) is also a solution of a certain stochastic fixed point equation (SFPE). In this article we study such and related SFPEs. In particular, the main result of this work proves existence of unique solutions of certain SFPEs in a general setting. As an application of this…
▽ More
The Feynman-Kac formula implies that every suitable classical solution of a semilinear Kolmogorov partial differential equation (PDE) is also a solution of a certain stochastic fixed point equation (SFPE). In this article we study such and related SFPEs. In particular, the main result of this work proves existence of unique solutions of certain SFPEs in a general setting. As an application of this main result we establish the existence of unique solutions of SFPEs associated with semilinear Kolmogorov PDEs with Lipschitz continuous nonlinearities even in the case where the associated semilinear Kolmogorov PDE does not possess a classical solution.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
Asset Pricing with General Transaction Costs: Theory and Numerics
Authors:
Lukas Gonon,
Johannes Muhle-Karbe,
Xiaofei Shi
Abstract:
We study risk-sharing equilibria with general convex costs on the agents' trading rates. For an infinite-horizon model with linear state dynamics and exogenous volatilities, we prove that the equilibrium returns mean-revert around their frictionless counterparts - the deviation has Ornstein-Uhlenbeck dynamics for quadratic costs whereas it follows a doubly-reflected Brownian motion if costs are pr…
▽ More
We study risk-sharing equilibria with general convex costs on the agents' trading rates. For an infinite-horizon model with linear state dynamics and exogenous volatilities, we prove that the equilibrium returns mean-revert around their frictionless counterparts - the deviation has Ornstein-Uhlenbeck dynamics for quadratic costs whereas it follows a doubly-reflected Brownian motion if costs are proportional. More general models with arbitrary state dynamics and endogenous volatilities lead to multidimensional systems of nonlinear, fully-coupled forward-backward SDEs. These fall outside the scope of known wellposedness results, but can be solved numerically using the simulation-based deep-learning approach of Han, Jentzen and E (2018). In a calibration to time series of prices and trading volume, realistic liquidity premia are accompanied by a moderate increase in volatility. The effects of different cost specifications are rather similar, justifying the use of quadratic costs as a proxy for other less tractable specifications.
△ Less
Submitted 15 April, 2020; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Existence and uniqueness results for time-inhomogeneous time-change equations and Fokker--Planck equations
Authors:
Leif Döring,
Lukas Gonon,
David J. Prömel,
Oleg Reichmann
Abstract:
We prove existence and uniqueness of solutions to Fokker--Planck equations associated to Markov operators multiplicatively perturbed by degenerate time-inhomogeneous coefficients. Precise conditions on the time-inhomogeneous coefficients are given. In particular, we do not necessarily require the coefficients to be neither globally bounded nor bounded away from zero. The approach is based on const…
▽ More
We prove existence and uniqueness of solutions to Fokker--Planck equations associated to Markov operators multiplicatively perturbed by degenerate time-inhomogeneous coefficients. Precise conditions on the time-inhomogeneous coefficients are given. In particular, we do not necessarily require the coefficients to be neither globally bounded nor bounded away from zero. The approach is based on constructing random time-changes and studying related martingale problems for Markov processes with values in locally compact, complete and separable metric spaces.
△ Less
Submitted 31 July, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
Reservoir Computing Universality With Stochastic Inputs
Authors:
Lukas Gonon,
Juan-Pablo Ortega
Abstract:
The universal approximation properties with respect to $L ^p $-type criteria of three important families of reservoir computers with stochastic discrete-time semi-infinite inputs is shown. First, it is proved that linear reservoir systems with either polynomial or neural network readout maps are universal. More importantly, it is proved that the same property holds for two families with linear rea…
▽ More
The universal approximation properties with respect to $L ^p $-type criteria of three important families of reservoir computers with stochastic discrete-time semi-infinite inputs is shown. First, it is proved that linear reservoir systems with either polynomial or neural network readout maps are universal. More importantly, it is proved that the same property holds for two families with linear readouts, namely, trigonometric state-affine systems and echo state networks, which are the most widely used reservoir systems in applications. The linearity in the readouts is a key feature in supervised machine learning applications. It guarantees that these systems can be used in high-dimensional situations and in the presence of large datasets. The $L ^p $ criteria used in this paper allow the formulation of universality results that do not necessarily impose almost sure uniform boundedness in the inputs or the fading memory property in the filter that needs to be approximated.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.
-
Deep Hedging
Authors:
Hans Bühler,
Lukas Gonon,
Josef Teichmann,
Ben Wood
Abstract:
We present a framework for hedging a portfolio of derivatives in the presence of market frictions such as transaction costs, market impact, liquidity constraints or risk limits using modern deep reinforcement machine learning methods.
We discuss how standard reinforcement learning methods can be applied to non-linear reward structures, i.e. in our case convex risk measures. As a general contribu…
▽ More
We present a framework for hedging a portfolio of derivatives in the presence of market frictions such as transaction costs, market impact, liquidity constraints or risk limits using modern deep reinforcement machine learning methods.
We discuss how standard reinforcement learning methods can be applied to non-linear reward structures, i.e. in our case convex risk measures. As a general contribution to the use of deep learning for stochastic processes, we also show that the set of constrained trading strategies used by our algorithm is large enough to $ε$-approximate any optimal solution.
Our algorithm can be implemented efficiently even in high-dimensional situations using modern machine learning tools. Its structure does not depend on specific market dynamics, and generalizes across hedging instruments including the use of liquid derivatives. Its computational performance is largely invariant in the size of the portfolio as it depends mainly on the number of hedging instruments available.
We illustrate our approach by showing the effect on hedging under transaction costs in a synthetic market driven by the Heston model, where we outperform the standard "complete market" solution.
△ Less
Submitted 8 February, 2018;
originally announced February 2018.
-
Linearized Filtering of Affine Processes Using Stochastic Riccati Equations
Authors:
Lukas Gonon,
Josef Teichmann
Abstract:
We consider an affine process $X$ which is only observed up to an additive white noise, and we ask for its law, for some time $t > 0 $, conditional on all observations up to this time $ t $. This is a general, possibly high dimensional filtering problem which is not even locally approximately Gaussian, whence essentially only particle filtering methods remain as solution techniques. In this work w…
▽ More
We consider an affine process $X$ which is only observed up to an additive white noise, and we ask for its law, for some time $t > 0 $, conditional on all observations up to this time $ t $. This is a general, possibly high dimensional filtering problem which is not even locally approximately Gaussian, whence essentially only particle filtering methods remain as solution techniques. In this work we present an efficient numerical solution by introducing an approximate filter for which conditional characteristic functions can be calculated by solving a system of generalized Riccati differential equations depending on the observation and the process characteristics of the signal $X$. The quality of the approximation can be controlled by easily observable quantities in terms of a macro location of the signal in state space. Asymptotic techniques as well as maximization techniques can be directly applied to the solutions of the Riccati equations leading to novel very tractable filtering formulas. The efficiency of the method is illustrated with numerical experiments for Cox-Ingersoll-Ross and Wishart processes, for which Gaussian approximations usually fail.
△ Less
Submitted 23 January, 2018;
originally announced January 2018.
-
On Skorokhod Embeddings and Poisson Equations
Authors:
Leif Doering,
Lukas Gonon,
David J. Prömel,
Oleg Reichmann
Abstract:
The classical Skorokhod embedding problem for a Brownian motion $W$ asks to find a stopping time $τ$ so that $W_τ$ is distributed according to a prescribed probability distribution $μ$. Many solutions have been proposed during the past 50 years and applications in different fields emerged. This article deals with a generalized Skorokhod embedding problem (SEP): Let $X$ be a Markov process with ini…
▽ More
The classical Skorokhod embedding problem for a Brownian motion $W$ asks to find a stopping time $τ$ so that $W_τ$ is distributed according to a prescribed probability distribution $μ$. Many solutions have been proposed during the past 50 years and applications in different fields emerged. This article deals with a generalized Skorokhod embedding problem (SEP): Let $X$ be a Markov process with initial marginal distribution $μ_0$ and let $μ_1$ be a probability measure. The task is to find a stopping time $τ$ such that $X_τ$ is distributed according to $μ_1$. More precisely, we study the question of deciding if a finite mean solution to the SEP can exist for given $μ_0, μ_1$ and the task of giving a solution which is as explicit as possible. If $μ_0$ and $μ_1$ have positive densities $h_0$ and $h_1$ and the generator $\mathcal A$ of $X$ has a formal adjoint operator $\mathcal A^*$, then we propose necessary and sufficient conditions for the existence of an embedding in terms of the Poisson equation $\mathcal A^* H=h_1-h_0$ and give a fairly explicit construction of the stopping time using the solution of the Poisson equation. For the class of Lévy processes we carry out the procedure and extend a result of Bertoin and Le Jan to Lévy processes without local times.
△ Less
Submitted 31 July, 2019; v1 submitted 16 March, 2017;
originally announced March 2017.