-
Generative Market Equilibrium Models with Stable Adversarial Learning via Reinforcement
Authors:
Anastasis Kratsios,
Xiaofei Shi,
Qiang Sun,
Zhanhao Zhang
Abstract:
We present a general computational framework for solving continuous-time financial market equilibria under minimal modeling assumptions while incorporating realistic financial frictions, such as trading costs, and supporting multiple interacting agents. Inspired by generative adversarial networks (GANs), our approach employs a novel generative deep reinforcement learning framework with a decouplin…
▽ More
We present a general computational framework for solving continuous-time financial market equilibria under minimal modeling assumptions while incorporating realistic financial frictions, such as trading costs, and supporting multiple interacting agents. Inspired by generative adversarial networks (GANs), our approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which we term as the \emph{reinforcement link}. This architecture stabilizes the training dynamics by incorporating feedback from the discriminator. Our theoretically guided feedback mechanism enables the decoupling of the equilibrium system, overcoming challenges that hinder conventional numerical algorithms. Experimentally, our algorithm not only learns but also provides testable predictions on how asset returns and volatilities emerge from the endogenous trading behavior of market participants, where traditional analytical methods fall short. The design of our model is further supported by an approximation guarantee.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Neural Operators Can Play Dynamic Stackelberg Games
Authors:
Guillermo Alvarez,
Ibrahim Ekren,
Anastasis Kratsios,
Xuwei Yang
Abstract:
Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy. Unfortunately, only stylized Stackelberg games are explicitly solvable since the follower's best-response operator (as a function of the control of the leader) is typically analytically intractable. This paper addresses this issue by…
▽ More
Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy. Unfortunately, only stylized Stackelberg games are explicitly solvable since the follower's best-response operator (as a function of the control of the leader) is typically analytically intractable. This paper addresses this issue by showing that the \textit{follower's best-response operator} can be approximately implemented by an \textit{attention-based neural operator}, uniformly on compact subsets of adapted open-loop controls for the leader. We further show that the value of the Stackelberg game where the follower uses the approximate best-response operator approximates the value of the original Stackelberg game. Our main result is obtained using our universal approximation theorem for attention-based neural operators between spaces of square-integrable adapted stochastic processes, as well as stability results for a general class of Stackelberg games.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Simultaneously Solving FBSDEs and their Associated Semilinear Elliptic PDEs with Small Neural Operators
Authors:
Takashi Furuya,
Anastasis Kratsios
Abstract:
Forward-backwards stochastic differential equations (FBSDEs) play an important role in optimal control, game theory, economics, mathematical finance, and in reinforcement learning. Unfortunately, the available FBSDE solvers operate on \textit{individual} FBSDEs, meaning that they cannot provide a computationally feasible strategy for solving large families of FBSDEs, as these solvers must be re-ru…
▽ More
Forward-backwards stochastic differential equations (FBSDEs) play an important role in optimal control, game theory, economics, mathematical finance, and in reinforcement learning. Unfortunately, the available FBSDE solvers operate on \textit{individual} FBSDEs, meaning that they cannot provide a computationally feasible strategy for solving large families of FBSDEs, as these solvers must be re-run several times. \textit{Neural operators} (NOs) offer an alternative approach for \textit{simultaneously solving} large families of decoupled FBSDEs by directly approximating the solution operator mapping \textit{inputs:} terminal conditions and dynamics of the backwards process to \textit{outputs:} solutions to the associated FBSDE. Though universal approximation theorems (UATs) guarantee the existence of such NOs, these NOs are unrealistically large. Upon making only a few simple theoretically-guided tweaks to the standard convolutional NO build, we confirm that ``small'' NOs can uniformly approximate the solution operator to structured families of FBSDEs with random terminal time, uniformly on suitable compact sets determined by Sobolev norms using a logarithmic depth, a constant width, and a polynomial rank in the reciprocal approximation error.
This result is rooted in our second result, and main contribution to the NOs for PDE literature, showing that our convolutional NOs of similar depth and width but grow only \textit{quadratically} (at a dimension-free rate) when uniformly approximating the solution operator of the associated class of semilinear Elliptic PDEs to these families of FBSDEs. A key insight into how NOs work we uncover is that the convolutional layers of our NO can approximately implement the fixed point iteration used to prove the existence of a unique solution to these semilinear Elliptic PDEs.
△ Less
Submitted 28 May, 2025; v1 submitted 18 October, 2024;
originally announced October 2024.
-
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Authors:
Raeid Saqur,
Anastasis Kratsios,
Florian Krach,
Yannick Limmer,
Jacob-Junqi Tian,
John Willes,
Blanka Horvath,
Frank Rudzicz
Abstract:
We propose MoE-F - a formalized mechanism for combining $N$ pre-trained Large Language Models (LLMs) for online time-series prediction by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Di…
▽ More
We propose MoE-F - a formalized mechanism for combining $N$ pre-trained Large Language Models (LLMs) for online time-series prediction by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, our approach employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs N parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the N filter outputs are optimally aggregated to maximize their robust predictive power, and this update is computed efficiently via a closed-form expression, generating our ensemble predictor. Our contributions are: **(I)** the MoE-F plug-and-play filtering harness algorithm, **(II)** theoretical optimality guarantees of the proposed filtering-based gating algorithm (via optimality guarantees for its parallel Bayesian filtering and its robust aggregation steps), and **(III)** empirical evaluation and ablative results using state-of-the-art foundational and MoE LLMs on a real-world __Financial Market Movement__ task where MoE-F attains a remarkable 17\% absolute and 48.5\% relative F1 measure improvement over the next best performing individual LLM expert predicting short-horizon market movement based on streaming news. Further, we provide empirical evidence of substantial performance gains in applying MoE-F over specialized models in the long-horizon time-series forecasting domain.
△ Less
Submitted 20 February, 2025; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Low-dimensional approximations of the conditional law of Volterra processes: a non-positive curvature approach
Authors:
Reza Arabpour,
John Armstrong,
Luca Galimberti,
Anastasis Kratsios,
Giulia Livieri
Abstract:
Predicting the conditional evolution of Volterra processes with stochastic volatility is a crucial challenge in mathematical finance. While deep neural network models offer promise in approximating the conditional law of such processes, their effectiveness is hindered by the curse of dimensionality caused by the infinite dimensionality and non-smooth nature of these problems. To address this, we p…
▽ More
Predicting the conditional evolution of Volterra processes with stochastic volatility is a crucial challenge in mathematical finance. While deep neural network models offer promise in approximating the conditional law of such processes, their effectiveness is hindered by the curse of dimensionality caused by the infinite dimensionality and non-smooth nature of these problems. To address this, we propose a two-step solution. Firstly, we develop a stable dimension reduction technique, projecting the law of a reasonably broad class of Volterra process onto a low-dimensional statistical manifold of non-positive sectional curvature. Next, we introduce a sequentially deep learning model tailored to the manifold's geometry, which we show can approximate the projected conditional law of the Volterra process. Our model leverages an auxiliary hypernetwork to dynamically update its internal parameters, allowing it to encode non-stationary dynamics of the Volterra process, and it can be interpreted as a gating mechanism in a mixture of expert models where each expert is specialized at a specific point in time. Our hypernetwork further allows us to achieve approximation rates that would seemingly only be possible with very large networks.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing
Authors:
Xuwei Yang,
Anastasis Kratsios,
Florian Krach,
Matheus Grasselli,
Aurelien Lucchi
Abstract:
We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_θ$. Our objective is to minimize the cumulative deviation of the generated parameters $\{θ_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $θ^\star_{1},\ldots,θ^\star_N$ obtained for each datase…
▽ More
We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_θ$. Our objective is to minimize the cumulative deviation of the generated parameters $\{θ_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $θ^\star_{1},\ldots,θ^\star_N$ obtained for each dataset, while respecting the loss function for the model $f_{θ(T)}$ produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model $f_θ$ is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, where $\bar{N}$ is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.
△ Less
Submitted 3 October, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Generative Ornstein-Uhlenbeck Markets via Geometric Deep Learning
Authors:
Anastasis Kratsios,
Cody Hyndman
Abstract:
We consider the problem of simultaneously approximating the conditional distribution of market prices and their log returns with a single machine learning model. We show that an instance of the GDN model of Kratsios and Papon (2022) solves this problem without having prior assumptions on the market's "clipped" log returns, other than that they follow a generalized Ornstein-Uhlenbeck process with a…
▽ More
We consider the problem of simultaneously approximating the conditional distribution of market prices and their log returns with a single machine learning model. We show that an instance of the GDN model of Kratsios and Papon (2022) solves this problem without having prior assumptions on the market's "clipped" log returns, other than that they follow a generalized Ornstein-Uhlenbeck process with a priori unknown dynamics. We provide universal approximation guarantees for these conditional distributions and contingent claims with a Lipschitz payoff function.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Designing Universal Causal Deep Learning Models: The Case of Infinite-Dimensional Dynamical Systems from Stochastic Analysis
Authors:
Luca Galimberti,
Anastasis Kratsios,
Giulia Livieri
Abstract:
Several non-linear operators in stochastic analysis, such as solution maps to stochastic differential equations, depend on a temporal structure which is not leveraged by contemporary neural operators designed to approximate general maps between Banach space. This paper therefore proposes an operator learning solution to this open problem by introducing a deep learning model-design framework that t…
▽ More
Several non-linear operators in stochastic analysis, such as solution maps to stochastic differential equations, depend on a temporal structure which is not leveraged by contemporary neural operators designed to approximate general maps between Banach space. This paper therefore proposes an operator learning solution to this open problem by introducing a deep learning model-design framework that takes suitable infinite-dimensional linear metric spaces, e.g. Banach spaces, as inputs and returns a universal \textit{sequential} deep learning model adapted to these linear geometries specialized for the approximation of operators encoding a temporal structure. We call these models \textit{Causal Neural Operators}. Our main result states that the models produced by our framework can uniformly approximate on compact sets and across arbitrarily finite-time horizons Hölder or smooth trace class operators, which causally map sequences between given linear metric spaces. Our analysis uncovers new quantitative relationships on the latent state-space dimension of Causal Neural Operators, which even have new implications for (classical) finite-dimensional Recurrent Neural Networks. In addition, our guarantees for recurrent neural networks are tighter than the available results inherited from feedforward neural networks when approximating dynamical systems between finite-dimensional spaces.
△ Less
Submitted 10 April, 2025; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Designing Universal Causal Deep Learning Models: The Geometric (Hyper)Transformer
Authors:
Beatrice Acciaio,
Anastasis Kratsios,
Gudmund Pammer
Abstract:
Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which t…
▽ More
Several problems in stochastic analysis are defined through their geometry, and preserving that geometric structure is essential to generating meaningful predictions. Nevertheless, how to design principled deep learning (DL) models capable of encoding these geometric structures remains largely unknown. We address this open problem by introducing a universal causal geometric DL framework in which the user specifies a suitable pair of metric spaces $\mathscr{X}$ and $\mathscr{Y}$ and our framework returns a DL model capable of causally approximating any ``regular'' map sending time series in $\mathscr{X}^{\mathbb{Z}}$ to time series in $\mathscr{Y}^{\mathbb{Z}}$ while respecting their forward flow of information throughout time. Suitable geometries on $\mathscr{Y}$ include various (adapted) Wasserstein spaces arising in optimal stopping problems, a variety of statistical manifolds describing the conditional distribution of continuous-time finite state Markov chains, and all Fréchet spaces admitting a Schauder basis, e.g. as in classical finance. Suitable spaces $\mathscr{X}$ are compact subsets of any Euclidean space. Our results all quantitatively express the number of parameters needed for our DL model to achieve a given approximation error as a function of the target map's regularity and the geometric structure both of $\mathscr{X}$ and of $\mathscr{Y}$. Even when omitting any temporal structure, our universal approximation theorems are the first guarantees that Hölder functions, defined between such $\mathscr{X}$ and $\mathscr{Y}$ can be approximated by DL models.
△ Less
Submitted 9 March, 2023; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices
Authors:
Calypso Herrera,
Florian Krach,
Anastasis Kratsios,
Pierre Ruyssen,
Josef Teichmann
Abstract:
The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that nearly instantaneousl…
▽ More
The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that nearly instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of covariance matrices, or more generally, of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees for Denise are provided. These include a novel universal approximation theorem adapted to our geometric deep learning problem and convergence to an optimal solution to the learning problem. Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately $2000\times$ faster than the state-of-the-art, principal component pursuit (PCP), and $200 \times$ faster than the current speed-optimized method, fast PCP.
△ Less
Submitted 6 June, 2023; v1 submitted 28 April, 2020;
originally announced April 2020.
-
Partial Uncertainty and Applications to Risk-Averse Valuation
Authors:
Anastasis Kratsios
Abstract:
This paper introduces an intermediary between conditional expectation and conditional sublinear expectation, called R-conditioning. The R-conditioning of a random-vector in $L^2$ is defined as the best $L^2$-estimate, given a $σ$-subalgebra and a degree of model uncertainty. When the random vector represents the payoff of derivative security in a complete financial market, its R-conditioning with…
▽ More
This paper introduces an intermediary between conditional expectation and conditional sublinear expectation, called R-conditioning. The R-conditioning of a random-vector in $L^2$ is defined as the best $L^2$-estimate, given a $σ$-subalgebra and a degree of model uncertainty. When the random vector represents the payoff of derivative security in a complete financial market, its R-conditioning with respect to the risk-neutral measure is interpreted as its risk-averse value. The optimization problem defining the optimization R-conditioning is shown to be well-posed. We show that the R-conditioning operators can be used to approximate a large class of sublinear expectations to arbitrary precision. We then introduce a novel numerical algorithm for computing the R-conditioning. This algorithm is shown to be strongly convergent.
Implementations are used to compare the risk-averse value of a Vanilla option to its traditional risk-neutral value, within the Black-Scholes-Merton framework. Concrete connections to robust finance, sensitivity analysis, and high-dimensional estimation are all treated in this paper.
△ Less
Submitted 28 October, 2019; v1 submitted 30 September, 2019;
originally announced September 2019.
-
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
Authors:
Anastasis Kratsios,
Cody Hyndman
Abstract:
Effective feature representation is key to the predictive performance of any algorithm. This paper introduces a meta-procedure, called Non-Euclidean Upgrading (NEU), which learns feature maps that are expressive enough to embed the universal approximation property (UAP) into most model classes while only outputting feature maps that preserve any model class's UAP. We show that NEU can learn any fe…
▽ More
Effective feature representation is key to the predictive performance of any algorithm. This paper introduces a meta-procedure, called Non-Euclidean Upgrading (NEU), which learns feature maps that are expressive enough to embed the universal approximation property (UAP) into most model classes while only outputting feature maps that preserve any model class's UAP. We show that NEU can learn any feature map with these two properties if that feature map is asymptotically deformable into the identity. We also find that the feature-representations learned by NEU are always submanifolds of the feature space. NEU's properties are derived from a new deep neural model that is universal amongst all orientation-preserving homeomorphisms on the input space. We derive qualitative and quantitative approximation guarantees for this architecture. We quantify the number of parameters required for this new architecture to memorize any set of input-output pairs while simultaneously fixing every point of the input space lying outside some compact set, and we quantify the size of this set as a function of our model's depth. Moreover, we show that no deep feed-forward network with commonly used activation function has all these properties. NEU's performance is evaluated against competing machine learning methods on various regression and dimension reduction tasks both with financial and simulated data.
△ Less
Submitted 10 May, 2021; v1 submitted 31 August, 2018;
originally announced September 2018.
-
Optimal Stochastic Decensoring and Applications to Calibration of Market Models
Authors:
Anastasis Kratsios
Abstract:
Typically flat filling, linear or polynomial interpolation methods to generate missing historical data. We introduce a novel optimal method for recreating data generated by a diffusion process. The results are then applied to recreate historical data for stocks.
Typically flat filling, linear or polynomial interpolation methods to generate missing historical data. We introduce a novel optimal method for recreating data generated by a diffusion process. The results are then applied to recreate historical data for stocks.
△ Less
Submitted 18 December, 2017; v1 submitted 13 December, 2017;
originally announced December 2017.
-
Non-Euclidean Conditional Expectation and Filtering
Authors:
Anastasis Kratsios,
Cody B. Hyndman
Abstract:
A non-Euclidean generalization of conditional expectation is introduced and characterized as the minimizer of expected intrinsic squared-distance from a manifold-valued target. The computational tractable formulation expresses the non-convex optimization problem as transformations of Euclidean conditional expectation. This gives computationally tractable filtering equations for the dynamics of the…
▽ More
A non-Euclidean generalization of conditional expectation is introduced and characterized as the minimizer of expected intrinsic squared-distance from a manifold-valued target. The computational tractable formulation expresses the non-convex optimization problem as transformations of Euclidean conditional expectation. This gives computationally tractable filtering equations for the dynamics of the intrinsic conditional expectation of a manifold-valued signal and is used to obtain accurate numerical forecasts of efficient portfolios by incorporating their geometric structure into the estimates.
△ Less
Submitted 6 September, 2018; v1 submitted 16 October, 2017;
originally announced October 2017.
-
Deep Learning in a Generalized HJM-type Framework Through Arbitrage-Free Regularization
Authors:
Anastasis Kratsios,
Cody B. Hyndman
Abstract:
We introduce a regularization approach to arbitrage-free factor-model selection. The considered model selection problem seeks to learn the closest arbitrage-free HJM-type model to any prespecified factor-model. An asymptotic solution to this, a priori computationally intractable, problem is represented as the limit of a 1-parameter family of optimizers to computationally tractable model selection…
▽ More
We introduce a regularization approach to arbitrage-free factor-model selection. The considered model selection problem seeks to learn the closest arbitrage-free HJM-type model to any prespecified factor-model. An asymptotic solution to this, a priori computationally intractable, problem is represented as the limit of a 1-parameter family of optimizers to computationally tractable model selection tasks. Each of these simplified model-selection tasks seeks to learn the most similar model, to the prescribed factor-model, subject to a penalty detecting when the reference measure is a local martingale-measure for the entire underlying financial market. A simple expression for the penalty terms is obtained in the bond market withing the affine-term structure setting, and it is used to formulate a deep-learning approach to arbitrage-free affine term-structure modelling. Numerical implementations are also performed to evaluate the performance in the bond market.
△ Less
Submitted 5 December, 2019; v1 submitted 13 October, 2017;
originally announced October 2017.
-
The Entropic Measure Transform
Authors:
Renjie Wang,
Cody Hyndman,
Anastasis Kratsios
Abstract:
We introduce the entropic measure transform (EMT) problem for a general process and prove the existence of a unique optimal measure characterizing the solution. The density process of the optimal measure is characterized using a semimartingale BSDE under general conditions. The EMT is used to reinterpret the conditional entropic risk-measure and to obtain a convenient formula for the conditional e…
▽ More
We introduce the entropic measure transform (EMT) problem for a general process and prove the existence of a unique optimal measure characterizing the solution. The density process of the optimal measure is characterized using a semimartingale BSDE under general conditions. The EMT is used to reinterpret the conditional entropic risk-measure and to obtain a convenient formula for the conditional expectation of a process which admits an affine representation under a related measure. The entropic measure transform is then used provide a new characterization of defaultable bond prices, forward prices, and futures prices when the asset is driven by a jump diffusion. The characterization of these pricing problems in terms of the EMT provides economic interpretations as a maximization of returns subject to a penalty for removing financial risk as expressed through the aggregate relative entropy. The EMT is shown to extend the optimal stochastic control characterization of default-free bond prices of Gombani and Runggaldier (Math. Financ. 23(4):659-686, 2013). These methods are illustrated numerically with an example in the defaultable bond setting.
△ Less
Submitted 21 February, 2019; v1 submitted 18 November, 2015;
originally announced November 2015.