Search | arXiv e-print repository

Global Convergence of Adjoint-Optimized Neural PDEs

Authors: Konstantin Riedl, Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner… ▽ More Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to available data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural-network PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural-network PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (1) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (2) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem, even in the infinite-width hidden layer limit (unlike in typical neual network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 63 pages, 2 figures

MSC Class: 49M41; 35Q93; 68T07; 90C26; 35K55

arXiv:2502.04280 [pdf, other]

Mean-Field Analysis of Latent Variable Process Models on Dynamically Evolving Graphs with Feedback Effects

Authors: Ankan Ganguly, Konstantinos Spiliopoulos, Daniel Sussman

Abstract: In this paper, we study the asymptotic behavior of a class of dynamic co-evolving latent space networks. The model we study is subject to bi-directional feedback effects, meaning that at any given time, the latent process depends on its own value and the graph structure at the previous time step, and the graph structure at the current time depends on the value of the latent processes at the curren… ▽ More In this paper, we study the asymptotic behavior of a class of dynamic co-evolving latent space networks. The model we study is subject to bi-directional feedback effects, meaning that at any given time, the latent process depends on its own value and the graph structure at the previous time step, and the graph structure at the current time depends on the value of the latent processes at the current time but also on the graph structure at the previous time instance (sometimes called a persistence effect). We construct the mean-field limit of this model, which we use to characterize the limiting behavior of a random sample taken from the latent space network in the limit as the number of nodes in the network diverges. From this limiting model, we can derive the limiting behavior of the empirical measure of the latent process and establish the related graphon limit of the latent particle network process. We also provide a description of the rich conditional probabilistic structure of the limiting model. The inherent dependence structure complicates the mathematical analysis significantly. In the process of proving our main results, we derive a general conditional propagation of chaos result, which is of independent interest. In addition, our novel approach of studying the limiting behavior of random samples proves to be a very useful methodology for fully grasping the asymptotic behavior of co-evolving particle systems. Numerical results are included to illustrate the theoretical findings. △ Less

Submitted 26 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

Comments: 66 pages, 4 figures

MSC Class: 60K35; 60J05; 91D30 (Primary) 60B10; 60G57; 62D05 (Secondary)

arXiv:2502.01783 [pdf, ps, other]

Uniform attraction and exit problems for stochastic damped wave equations

Authors: Ioannis Gasteratos, Michael Salins, Konstantinos Spiliopoulos

Abstract: We consider a class of wave equations with constant damping and polynomial nonlinearities that are perturbed by small, multiplicative, space-time white noise. The equations are defined on a one-dimensional bounded interval with Dirichlet boundary conditions, continuous initial position and distributional initial velocity. In the first part of this work, we study the corresponding deterministic dyn… ▽ More We consider a class of wave equations with constant damping and polynomial nonlinearities that are perturbed by small, multiplicative, space-time white noise. The equations are defined on a one-dimensional bounded interval with Dirichlet boundary conditions, continuous initial position and distributional initial velocity. In the first part of this work, we study the corresponding deterministic dynamics and prove that certain neighborhoods of asymptotically stable equilibria are uniformly attracting in the topology of uniform convergence. Then, we consider exit problems for local solutions of the stochastic damped wave equations from bounded domains $D$ of uniform attraction. Using tools from large deviations along with novel controllability results, we obtain logarithmic asymptotics for exit times and exit places, in the vanishing noise limit, that are expressed in terms of the corresponding quasipotential. In doing so, we develop arguments that take into account the lack of both smoothing and exact controllability that are inherent to the problem at hand. Moreover, our exit time results provide asymptotic lower bounds for the mean explosion time of local solutions. We introduce a novel notion of "regular" boundary points allowing to avoid the question of boundary smoothness in infinite dimensions and leading to the proof of a large deviations lower bound for the exit place. We illustrate this notion by providing explicit examples for different classes of domains $D$. Conditions under which lower and upper bounds for exit time and exit place logarithmic asymptotic hold, are also presented. In addition, we obtain deterministic stability results for linear damped wave equations that are of independent interest. △ Less

Submitted 3 February, 2025; originally announced February 2025.

MSC Class: 60F10; 60H15; 35R60; 60G40; 37L15

arXiv:2501.08040 [pdf, other]

Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks

Authors: Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Recurrent neural networks (RNNs) are commonly trained with the truncated backpropagation-through-time (TBPTT) algorithm. For the purposes of computational tractability, the TBPTT algorithm truncates the chain rule and calculates the gradient on a finite block of the overall data sequence. Such approximation could lead to significant inaccuracies, as the block length for the truncated backpropagati… ▽ More Recurrent neural networks (RNNs) are commonly trained with the truncated backpropagation-through-time (TBPTT) algorithm. For the purposes of computational tractability, the TBPTT algorithm truncates the chain rule and calculates the gradient on a finite block of the overall data sequence. Such approximation could lead to significant inaccuracies, as the block length for the truncated backpropagation is typically limited to be much smaller than the overall sequence length. In contrast, Real-time recurrent learning (RTRL) is an online optimization algorithm which asymptotically follows the true gradient of the loss on the data sequence as the number of sequence time steps $t \rightarrow \infty$. RTRL forward propagates the derivatives of the RNN hidden/memory units with respect to the parameters and, using the forward derivatives, performs online updates of the parameters at each time step in the data sequence. RTRL's online forward propagation allows for exact optimization over extremely long data sequences, although it can be computationally costly for models with large numbers of parameters. We prove convergence of the RTRL algorithm for a class of RNNs. The convergence analysis establishes a fixed point for the joint distribution of the data sequence, RNN hidden layer, and the RNN hidden layer forward derivatives as the number of data samples from the sequence and the number of training steps tend to infinity. We prove convergence of the RTRL algorithm to a stationary point of the loss. Numerical studies illustrate our theoretical results. One potential application area for RTRL is the analysis of financial data, which typically involve long time series and models with small to medium numbers of parameters. This makes RTRL computationally tractable and a potentially appealing optimization method for training models. Thus, we include an example of RTRL applied to limit order book data. △ Less

Submitted 14 January, 2025; originally announced January 2025.

MSC Class: 68T07 (Primary); 68T05; 60J20 (Secondary)

arXiv:2404.18242 [pdf, other]

Uniform-in-time bounds for a stochastic hybrid system with fast periodic sampling and small white-noise

Authors: Shivam Singh Dhama, Konstantinos Spiliopoulos

Abstract: We study the asymptotic behavior, uniform-in-time, of a non-linear dynamical system under the combined effects of fast periodic sampling with period $δ$ and small white noise of size $\varepsilon,\thinspace 0<\varepsilon,δ\ll 1$. The dynamics depend on both the current and recent measurements of the state, and as such it is not Markovian. Our main results can be interpreted as Law of Large Numbers… ▽ More We study the asymptotic behavior, uniform-in-time, of a non-linear dynamical system under the combined effects of fast periodic sampling with period $δ$ and small white noise of size $\varepsilon,\thinspace 0<\varepsilon,δ\ll 1$. The dynamics depend on both the current and recent measurements of the state, and as such it is not Markovian. Our main results can be interpreted as Law of Large Numbers (LLN) and Central Limit Theorem (CLT) type results. LLN type result shows that the resulting stochastic process is close to an ordinary differential equation (ODE) uniformly in time as $\varepsilon,δ\searrow 0.$ Further, in regards to CLT, we provide quantitative and uniform-in-time control of the fluctuations process. The interaction of the small parameters provides an additional drift term in the limiting fluctuations, which captures both the sampling and noise effects. As a consequence, we obtain a first-order perturbation expansion of the stochastic process along with time-independent estimates on the remainder. The zeroth- and first-order terms in the expansion are given by an ODE and SDE, respectively. Simulation studies that illustrate and supplement the theoretical results are also provided. △ Less

Submitted 14 February, 2025; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: To appear in "Stochastics and Dynamics."

MSC Class: 60F17

arXiv:2309.03431 [pdf, other]

doi 10.1088/1361-6544/ad8fea

Mean field limits of particle-based stochastic reaction-drift-diffusion models

Authors: Max Heldman, Samuel A. Isaacson, Qianhan Liu, Konstantinos Spiliopoulos

Abstract: We consider particle-based stochastic reaction-drift-diffusion models where particles move via diffusion and drift induced by one- and two-body potential interactions. The dynamics of the particles are formulated as measure-valued stochastic processes (MVSPs), which describe the evolution of the singular, stochastic concentration fields of each chemical species. The mean field large population lim… ▽ More We consider particle-based stochastic reaction-drift-diffusion models where particles move via diffusion and drift induced by one- and two-body potential interactions. The dynamics of the particles are formulated as measure-valued stochastic processes (MVSPs), which describe the evolution of the singular, stochastic concentration fields of each chemical species. The mean field large population limit of such models is derived and proven, giving coarse-grained deterministic partial integro-differential equations (PIDEs) for the limiting deterministic concentration fields' dynamics. We generalize previous studies on the mean field limit of models involving only diffusive motion, with care to formulating the MVSP representation to ensure detailed balance of reversible reactions in the presence of potentials. Our work illustrates the more general set of PIDEs that arise in the mean field limit, demonstrating that the limiting macroscopic reactive interaction terms for reversible reactions obtain additional nonlinear concentration-dependent coefficients compared to the purely diffusive case. Numerical studies are presented which illustrate that two-body repulsive potential interactions can have a significant impact on the reaction dynamics, and also demonstrate the empirical numerical convergence of solutions to the PBSRDD model to the derived mean field PIDEs as the population size increases. △ Less

Submitted 5 January, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: added numerical results

arXiv:2308.14555 [pdf, other]

Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences

Authors: Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed po… ▽ More Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity. △ Less

Submitted 15 May, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Major revision for lemma 7.1

MSC Class: 68T07 (Primary); 68T05; 60J20 (Secondary)

arXiv:2302.07227 [pdf, other]

Transport map unadjusted Langevin algorithms: learning and discretizing perturbed samplers

Authors: Benjamin J. Zhang, Youssef M. Marzouk, Konstantinos Spiliopoulos

Abstract: Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target… ▽ More Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims. △ Less

Submitted 22 October, 2024; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 29 pages, 12 figures

MSC Class: 62D99; 60H35

arXiv:2301.09005 [pdf, ps, other]

Quantitative fluctuation analysis of multiscale diffusion systems via Malliavin calculus

Authors: Solesne Bourguin, Konstantinos Spiliopoulos

Abstract: We study fluctuations of small noise multiscale diffusions around their homogenized deterministic limit. We derive quantitative rates of convergence of the fluctuation processes to their Gaussian limits in the appropriate Wasserstein metric requiring detailed estimates of the first and second order Malliavin derivative of the slow component. We study a fully coupled system and the derivation of th… ▽ More We study fluctuations of small noise multiscale diffusions around their homogenized deterministic limit. We derive quantitative rates of convergence of the fluctuation processes to their Gaussian limits in the appropriate Wasserstein metric requiring detailed estimates of the first and second order Malliavin derivative of the slow component. We study a fully coupled system and the derivation of the quantitative rates of convergence depends on a very careful decomposition of the first and second Malliavin derivatives of the slow and fast component to terms that have different rates of convergence depending on the strength of the noise and timescale separation parameter. △ Less

Submitted 4 November, 2024; v1 submitted 21 January, 2023; originally announced January 2023.

arXiv:2209.01018 [pdf, other]

Normalization effects on deep neural networks

Authors: Jiahui Yu, Konstantinos Spiliopoulos

Abstract: We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. W… ▽ More We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{γ_{i}}$ with $γ_{i}\in[1/2,1]$ and we study the effect of the choice of the $γ_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $γ_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: arXiv admin note: text overlap with arXiv:2011.10487

MSC Class: 60F05; 68T01; 60G99

arXiv:2206.10819 [pdf, other]

Fluctuation analysis for particle-based stochastic reaction-diffusion models

Authors: Max Heldman, Samuel Isaacson, Jingwei Ma, Konstantinos Spiliopoulos

Abstract: Recent works have derived and proven the large-population mean-field limit for several classes of particle-based stochastic reaction-diffusion (PBSRD) models. These limits correspond to systems of partial integral-differential equations (PIDEs) that generalize standard mass-action reaction-diffusion PDE models. In this work we derive and prove the next order fluctuation corrections to such limits,… ▽ More Recent works have derived and proven the large-population mean-field limit for several classes of particle-based stochastic reaction-diffusion (PBSRD) models. These limits correspond to systems of partial integral-differential equations (PIDEs) that generalize standard mass-action reaction-diffusion PDE models. In this work we derive and prove the next order fluctuation corrections to such limits, which we show satisfy systems of stochastic PIDEs with Gaussian noise. Numerical examples are presented to illustrate how including the fluctuation corrections can enable the accurate estimation of higher order statistics of the underlying PBSRD model. △ Less

Submitted 12 October, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.06794 [pdf, ps, other]

Moderate deviation principle for multiscale systems driven by fractional Brownian motion

Authors: Solesne Bourguin, Thanh Dang, Konstantinos Spiliopoulos

Abstract: In this paper we study the moderate deviations principle (MDP) for slow-fast stochastic dynamical systems where the slow motion is governed by small fractional Brownian motion (fBm) with Hurst parameter $H\in(1/2,1)$. We derive conditions on the moderate deviations scaling and on the Hurst parameter $H$ under which the MDP holds. In addition, we show that in typical situations the resulting action… ▽ More In this paper we study the moderate deviations principle (MDP) for slow-fast stochastic dynamical systems where the slow motion is governed by small fractional Brownian motion (fBm) with Hurst parameter $H\in(1/2,1)$. We derive conditions on the moderate deviations scaling and on the Hurst parameter $H$ under which the MDP holds. In addition, we show that in typical situations the resulting action functional is discontinuous in $H$ at $H=1/2$, suggesting that the tail behavior of stochastic dynamical systems perturbed by fBm can have different characteristics than the tail behavior of such systems that are perturbed by standard Brownian motion. △ Less

Submitted 7 April, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Journal ref: journal of Theoretical Probability, 36, Article number: 1 (2023)

arXiv:2206.00646 [pdf, other]

Importance sampling for stochastic reaction-diffusion equations in the moderate deviation regime

Authors: Ioannis Gasteratos, Michael Salins, Konstantinos Spiliopoulos

Abstract: We develop a provably efficient importance sampling scheme that estimates exit probabilities of solutions to small-noise stochastic reaction-diffusion equations from scaled neighborhoods of a stable equilibrium. The moderate deviation scaling allows for a local approximation of the nonlinear dynamics by their linearized version. In addition, we identify a finite-dimensional subspace where exits ta… ▽ More We develop a provably efficient importance sampling scheme that estimates exit probabilities of solutions to small-noise stochastic reaction-diffusion equations from scaled neighborhoods of a stable equilibrium. The moderate deviation scaling allows for a local approximation of the nonlinear dynamics by their linearized version. In addition, we identify a finite-dimensional subspace where exits take place with high probability. Using stochastic control and variational methods we show that our scheme performs well both in the zero noise limit and pre-asymptotically. Simulation studies for stochastically perturbed bistable dynamics illustrate the theoretical results. △ Less

Submitted 22 October, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Version to appear in Stochastics and Partial Differential Equations: Analysis and Computations. 46 pages

MSC Class: 65C05; 60G99; 60F9

arXiv:2204.05270 [pdf, ps, other]

On the large-time behaviour of affine Volterra processes

Authors: Antoine Jacquier, Alexandre Pannier, Konstantinos Spiliopoulos

Abstract: We show the existence of a stationary measure for a class of multidimensional stochastic Volterra systems of affine type. These processes are in general not Markovian, a shortcoming which hinders their large-time analysis. We circumvent this issue by lifting the system to a measure-valued stochastic PDE introduced by Cuchiero and Teichmann, whence we retrieve the Markov property. Leveraging on the… ▽ More We show the existence of a stationary measure for a class of multidimensional stochastic Volterra systems of affine type. These processes are in general not Markovian, a shortcoming which hinders their large-time analysis. We circumvent this issue by lifting the system to a measure-valued stochastic PDE introduced by Cuchiero and Teichmann, whence we retrieve the Markov property. Leveraging on the associated generalised Feller property, we extend the Krylov-Bogoliubov theorem to this infinite-dimensional setting and thus establish an approach to the existence of invariant measures. We present concrete examples, including the rough Heston model from Mathematical Finance. △ Less

Submitted 2 July, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: 24 pages, changed title to better reflect the results of the paper

MSC Class: 60F05 (Primary) 60G22; 60H15; 60J25 (Secondary)

arXiv:2202.08403 [pdf, ps, other]

Moderate deviations for fully coupled multiscale weakly interacting particle systems

Authors: Zachary Bezemek, Konstantinos Spiliopoulos

Abstract: We consider a collection of fully coupled weakly interacting diffusion processes moving in a two-scale environment. We study the moderate deviations principle of the empirical distribution of the particles' positions in the combined limit as the number of particles grow to infinity and the time-scale separation parameter goes to zero simultaneously. We make use of weak convergence methods, which p… ▽ More We consider a collection of fully coupled weakly interacting diffusion processes moving in a two-scale environment. We study the moderate deviations principle of the empirical distribution of the particles' positions in the combined limit as the number of particles grow to infinity and the time-scale separation parameter goes to zero simultaneously. We make use of weak convergence methods, which provide a convenient representation for the moderate deviations rate function in terms of an effective mean field control problem. We rigorously obtain equivalent representations for the moderate deviations rate function in an appropriate "negative Sobolev" form, which is reminiscent of the large deviations rate function for the empirical measure of weakly interacting diffusions obtained in the 1987 seminal paper by Dawson-Gärtner. In the course of the proof we obtain related ergodic theorems and we rigorously study the regularity of Poisson type of equations associated to McKean-Vlasov problems, both of which are topics of independent interest. △ Less

Submitted 13 July, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Extended version. Journal version to appear in Stochastics and Partial Differential Equations: Analysis and Computations

arXiv:2202.07753 [pdf, other]

Rate of homogenization for fully-coupled McKean-Vlasov SDEs

Authors: Zachary Bezemek, Konstantinos Spiliopoulos

Abstract: We consider a fully-coupled slow-fast system of McKean-Vlasov SDEs with full dependence on the slow and fast component and on the law of the slow component and derive convergence rates to its homogenized limit. We do not make periodicity assumptions, but we impose conditions on the fast motion to guarantee ergodicity. In the course of the proof we obtain related ergodic theorems and we gain result… ▽ More We consider a fully-coupled slow-fast system of McKean-Vlasov SDEs with full dependence on the slow and fast component and on the law of the slow component and derive convergence rates to its homogenized limit. We do not make periodicity assumptions, but we impose conditions on the fast motion to guarantee ergodicity. In the course of the proof we obtain related ergodic theorems and we gain results on the regularity of Poisson type of equations and of the associated Cauchy-Problem on the Wasserstein space that are of independent interest. △ Less

Submitted 23 August, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: 42 pages, 1 figure. To appear in Stochastics and Dynamics

arXiv:2108.08247 [pdf, other]

Geometry-informed irreversible perturbations for accelerated convergence of Langevin dynamics

Authors: Benjamin J. Zhang, Youssef M. Marzouk, Konstantinos Spiliopoulos

Abstract: We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (… ▽ More We introduce a novel geometry-informed irreversible perturbation that accelerates convergence of the Langevin algorithm for Bayesian computation. It is well documented that there exist perturbations to the Langevin dynamics that preserve its invariant measure while accelerating its convergence. Irreversible perturbations and reversible perturbations (such as Riemannian manifold Langevin dynamics (RMLD)) have separately been shown to improve the performance of Langevin samplers. We consider these two perturbations simultaneously by presenting a novel form of irreversible perturbation for RMLD that is informed by the underlying geometry. Through numerical examples, we show that this new irreversible perturbation can improve estimation performance over irreversible perturbations that do not take the geometry into account. Moreover we demonstrate that irreversible perturbations generally can be implemented in conjunction with the stochastic gradient version of the Langevin algorithm. Lastly, while continuous-time irreversible perturbations cannot impair the performance of a Langevin estimator, the situation can sometimes be more complicated when discretization is considered. To this end, we describe a discrete-time example in which irreversibility increases both the bias and variance of the resulting estimator. △ Less

Submitted 1 September, 2022; v1 submitted 18 August, 2021; originally announced August 2021.

arXiv:2101.09621 [pdf, other]

Online Adjoint Methods for Optimization of PDEs

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: We present and mathematically analyze an online adjoint algorithm for the optimization of partial differential equations (PDEs). Traditional adjoint algorithms would typically solve a new adjoint PDE at each optimization iteration, which can be computationally costly. In contrast, an online adjoint algorithm updates the design variables in continuous-time and thus constantly makes progress towards… ▽ More We present and mathematically analyze an online adjoint algorithm for the optimization of partial differential equations (PDEs). Traditional adjoint algorithms would typically solve a new adjoint PDE at each optimization iteration, which can be computationally costly. In contrast, an online adjoint algorithm updates the design variables in continuous-time and thus constantly makes progress towards minimizing the objective function. The online adjoint algorithm we consider is similar in spirit to the the pseudo-time-stepping, one-shot method which has been previously proposed. Motivated by the application of such methods to engineering problems, we mathematically study the convergence of the online adjoint algorithm. The online adjoint algorithm relies upon a time-relaxed adjoint PDE which provides an estimate of the direction of steepest descent. The algorithm updates this estimate continuously in time, and it asymptotically converges to the exact direction of steepest descent as $t \rightarrow \infty$. We rigorously prove that the online adjoint algorithm converges to a critical point of the objective function for optimizing the PDE. Under appropriate technical conditions, we also prove a convergence rate for the algorithm. A crucial step in the convergence proof is a multi-scale analysis of the coupled system for the forward PDE, adjoint PDE, and the gradient descent ODE for the design variables. △ Less

Submitted 26 January, 2022; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2101.00085 [pdf, ps, other]

Moderate deviations for systems of slow-fast stochastic reaction-diffusion equations

Authors: Ioannis Gasteratos, Michael Salins, Konstantinos Spiliopoulos

Abstract: The goal of this paper is to study the Moderate Deviation Principle (MDP) for a system of stochastic reaction-diffusion equations with a time-scale separation in slow and fast components and small noise in the slow component. Based on weak convergence methods in infinite dimensions and related stochastic control arguments, we obtain an exact form for the moderate deviations rate function in differ… ▽ More The goal of this paper is to study the Moderate Deviation Principle (MDP) for a system of stochastic reaction-diffusion equations with a time-scale separation in slow and fast components and small noise in the slow component. Based on weak convergence methods in infinite dimensions and related stochastic control arguments, we obtain an exact form for the moderate deviations rate function in different regimes as the small noise and time-scale separation parameters vanish. Many issues that appear due to the infinite dimensionality of the problem are completely absent in their finite-dimensional counterpart. In comparison to corresponding Large Deviation Principles, the moderate deviation scaling necessitates a more delicate approach to establishing tightness and properly identifying the limiting behavior of the underlying controlled problem. The latter involves regularity properties of a solution of an associated elliptic Kolmogorov equation on Hilbert space along with a finite-dimensional approximation argument. △ Less

Submitted 2 February, 2022; v1 submitted 31 December, 2020; originally announced January 2021.

arXiv:2011.10487 [pdf, other]

Normalization effects on shallow neural networks and related asymptotic expansions

Authors: Jiahui Yu, Konstantinos Spiliopoulos

Abstract: We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output, closing the… ▽ More We consider shallow (single hidden layer) neural networks and characterize their performance when trained with stochastic gradient descent as the number of hidden units $N$ and gradient descent steps grow to infinity. In particular, we investigate the effect of different scaling schemes, which lead to different normalizations of the neural network, on the network's statistical output, closing the gap between the $1/\sqrt{N}$ and the mean-field $1/N$ normalization. We develop an asymptotic expansion for the neural network's statistical output pointwise with respect to the scaling parameter as the number of hidden units grows to infinity. Based on this expansion, we demonstrate mathematically that to leading order in $N$, there is no bias-variance trade off, in that both bias and variance (both explicitly characterized) decrease as the number of hidden units increases and time grows. In addition, we show that to leading order in $N$, the variance of the neural network's statistical output decays as the implied normalization by the scaling parameter approaches the mean field normalization. Numerical studies on the MNIST and CIFAR10 datasets show that test and train accuracy monotonically improve as the neural network's normalization gets closer to the mean field normalization. △ Less

Submitted 1 June, 2022; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: Added link to code on GitHub: https://github.com/kspiliopoulos/NormalizationEffectsNeuralNetworks

MSC Class: 60F05; 68T01; 60G99

Journal ref: AIMS Journal on Foundations of Data Science, June 2021, Vol. 3, Issue 2, pp. 151-200

arXiv:2011.03032 [pdf, ps, other]

Large deviations for interacting multiscale particle systems

Authors: Zachary Bezemek, Konstantinos Spiliopoulos

Abstract: We consider a collection of weakly interacting diffusion processes moving in a two-scale locally periodic environment. We study the large deviations principle of the empirical distribution of the particles' positions in the combined limit as the number of particles grow to infinity and the time-scale separation parameter goes to zero simultaneously. We make use of weak convergence methods providin… ▽ More We consider a collection of weakly interacting diffusion processes moving in a two-scale locally periodic environment. We study the large deviations principle of the empirical distribution of the particles' positions in the combined limit as the number of particles grow to infinity and the time-scale separation parameter goes to zero simultaneously. We make use of weak convergence methods providing a convenient representation for the large deviations rate function, which allow us to characterize the effective controlled mean field dynamics. In addition, we obtain equivalent representations for the large deviations rate function of the form of Dawson-Gärtner which hold even in the case where the diffusion matrix depends on the empirical measure and when the particles undergo averaging in addition to the propagation of chaos. △ Less

Submitted 1 November, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: Version to appear in Stochastic Processes and their Applications. 64 pages

MSC Class: 60F10; 60F05

arXiv:2009.01392 [pdf, other]

How reaction-diffusion PDEs approximate the large-population limit of stochastic particle models

Authors: Samuel A Isaacson, Jingwei Ma, Konstantinos Spiliopoulos

Abstract: Reaction-diffusion PDEs and particle-based stochastic reaction-diffusion (PBSRD) models are commonly-used approaches for modeling the spatial dynamics of chemical and biological systems. Standard reaction-diffusion PDE models ignore the underlying stochasticity of spatial transport and reactions, and are often described as appropriate in regimes where there are large numbers of particles in a syst… ▽ More Reaction-diffusion PDEs and particle-based stochastic reaction-diffusion (PBSRD) models are commonly-used approaches for modeling the spatial dynamics of chemical and biological systems. Standard reaction-diffusion PDE models ignore the underlying stochasticity of spatial transport and reactions, and are often described as appropriate in regimes where there are large numbers of particles in a system. Recent studies have proven the rigorous large-population limit of PBSRD models, showing the resulting mean-field models (MFM) correspond to non-local systems of partial-integro differential equations. In this work we explore the rigorous relationship between standard reaction-diffusion PDE models and the derived MFM. We prove that the former can be interpreted as an asymptotic approximation to the later in the limit that bimolecular reaction kernels are short-range and averaging. As the reactive interaction length scale approaches zero, we prove the MFMs converge at second order to standard reaction-diffusion PDE models. In proving this result we also establish local well-posedness of the MFM model in time for general systems, and global well-posedness for specific reaction systems and kernels. Finally, we illustrate the agreement and disagreement between the MFM, SM and the underlying particle model for several numerical examples. △ Less

Submitted 31 May, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

arXiv:2007.11665 [pdf, ps, other]

Discrete-time inference for slow-fast systems driven by fractional Brownian motion

Authors: Solesne Bourguin, Siragan Gailus, Konstantinos Spiliopoulos

Abstract: We study statistical inference for small-noise-perturbed multiscale dynamical systems where the slow motion is driven by fractional Brownian motion. We develop statistical estimators for both the Hurst index as well as a vector of unknown parameters in the model based on a single time series of observations from the slow process only. We prove that these estimators are both consistent and asymptot… ▽ More We study statistical inference for small-noise-perturbed multiscale dynamical systems where the slow motion is driven by fractional Brownian motion. We develop statistical estimators for both the Hurst index as well as a vector of unknown parameters in the model based on a single time series of observations from the slow process only. We prove that these estimators are both consistent and asymptotically normal as the amplitude of the perturbation and the time-scale separation parameter go to zero. Numerical simulations illustrate the theoretical results. △ Less

Submitted 25 March, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: arXiv admin note: text overlap with arXiv:1906.02131

MSC Class: 60G22; 60H10; 60H07; 62F12

arXiv:2003.11868 [pdf, ps, other]

Mean Field Limits of Particle-Based Stochastic Reaction-Diffusion Models

Authors: Samuel A. Isaacson, Jingwei Ma, Konstantinos Spiliopoulos

Abstract: Particle-based stochastic reaction-diffusion (PBSRD) models are a popular approach for studying biological systems involving both noise in the reaction process and diffusive transport. In this work we derive coarse-grained deterministic partial integro-differential equation (PIDE) models that provide a mean field approximation to the volume reactivity PBSRD model, a model commonly used for studyin… ▽ More Particle-based stochastic reaction-diffusion (PBSRD) models are a popular approach for studying biological systems involving both noise in the reaction process and diffusive transport. In this work we derive coarse-grained deterministic partial integro-differential equation (PIDE) models that provide a mean field approximation to the volume reactivity PBSRD model, a model commonly used for studying cellular processes. We formulate a weak measure-valued stochastic process (MVSP) representation for the volume reactivity PBSRD model, demonstrating for a simplified but representative system that it is consistent with the commonly used Doi Fock Space representation of the corresponding forward equation. We then prove the convergence of the general volume reactivity model MVSP to the mean field PIDEs in the large-population (i.e. thermodynamic) limit. △ Less

Submitted 31 August, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

arXiv:1911.07304 [pdf, ps, other]

Asymptotics of Reinforcement Learning with Neural Networks

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution which is the solution of the Bellman equation, thus giving the optimal control for the… ▽ More We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution which is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on i.i.d. data with stochastic gradient descent under the widely-used Xavier initialization. △ Less

Submitted 2 April, 2021; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1907.04108

arXiv:1907.04108 [pdf, ps, other]

Scaling Limit of Neural Networks with the Xavier Initialization and Convergence to a Global Minimum

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE… ▽ More We analyze single-layer neural networks with the Xavier initialization in the asymptotic regime of large numbers of hidden units and large numbers of stochastic gradient descent training steps. The evolution of the neural network during training can be viewed as a stochastic system and, using techniques from stochastic analysis, we prove the neural network converges in distribution to a random ODE with a Gaussian distribution. The limit is completely different than in the typical mean-field results for neural networks due to the $\frac{1}{\sqrt{N}}$ normalization factor in the Xavier initialization (versus the $\frac{1}{N}$ factor in the typical mean-field framework). Although the pre-limit problem of optimizing a neural network is non-convex (and therefore the neural network may converge to a local minimum), the limit equation minimizes a (quadratic) convex objective function and therefore converges to a global minimum. Furthermore, under reasonable assumptions, the matrix in the limiting quadratic objective function is positive definite and thus the neural network (in the limit) will converge to a global minimum with zero loss on the training set. △ Less

Submitted 12 April, 2022; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: The results of this technical note have been extended and generalized in arXiv:1911.07304. In the present note the full details for the proof of the special case studied here are presented

arXiv:1906.02131 [pdf, ps, other]

Typical dynamics and fluctuation analysis of slow-fast systems driven by fractional Brownian motion

Authors: Solesne Bourguin, Siragan Gailus, Konstantinos Spiliopoulos

Abstract: This article studies typical dynamics and fluctuations for a slow-fast dynamical system perturbed by a small fractional Brownian noise. Based on an ergodic theorem with explicit rates of convergence, which may be of independent interest, we characterize the asymptotic dynamics of the slow component to two orders (i.e., the typical dynamics and the fluctuations). The limiting distribution of the fl… ▽ More This article studies typical dynamics and fluctuations for a slow-fast dynamical system perturbed by a small fractional Brownian noise. Based on an ergodic theorem with explicit rates of convergence, which may be of independent interest, we characterize the asymptotic dynamics of the slow component to two orders (i.e., the typical dynamics and the fluctuations). The limiting distribution of the fluctuations turns out to depend upon the manner in which the small-noise parameter is taken to zero relative to the scale-separation parameter. We study also an extension of the original model in which the relationship between the two small parameters leads to a qualitative difference in limiting behavior. The results of this paper provide an approximation, to two orders, to dynamical systems perturbed by small fractional Brownian noise and subject to multiscale effects. △ Less

Submitted 18 August, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

MSC Class: 60G22; 60H10; 60H07; 60H05

arXiv:1905.12589 [pdf, other]

doi 10.1007/s00332-020-09621-0

Selection of quasi-stationary states in the stochastically forced Navier-Stokes equation on the torus

Authors: Margaret Beck, Eric Cooper, Gabriel Lord, Konstantinos Spiliopoulos

Abstract: The stochastically forced vorticity equation associated with the two dimensional incompressible Navier-Stokes equation on $D_δ:=[0,2πδ]\times [0,2π]$ is considered for $δ\approx 1$, periodic boundary conditions, and viscocity $0<ν\ll 1$. An explicit family of quasi-stationary states of the deterministic vorticity equation is known to play an important role in the long-time evolution of solutions b… ▽ More The stochastically forced vorticity equation associated with the two dimensional incompressible Navier-Stokes equation on $D_δ:=[0,2πδ]\times [0,2π]$ is considered for $δ\approx 1$, periodic boundary conditions, and viscocity $0<ν\ll 1$. An explicit family of quasi-stationary states of the deterministic vorticity equation is known to play an important role in the long-time evolution of solutions both in the presence of and without noise. Recent results show the parameter $δ$ plays a central role in selecting which of the quasi-stationary states is most important. In this paper, we aim to develop a finite dimensional model that captures this selection mechanism for the stochastic vorticity equation. This is done by projecting the vorticity equation in Fourier space onto a center manifold corresponding to the lowest eight Fourier modes. Through Monte Carlo simulation, the vorticity equation and the model are shown to be in agreement regarding key aspects of the long-time dynamics. Following this comparison, perturbation analysis is performed on the model via averaging and homogenization techniques to determine the leading order dynamics for statistics of interest for $δ\approx1$. △ Less

Submitted 10 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: 23 pages, 27 figures

arXiv:1903.06038 [pdf, ps, other]

Metastability and exit problems for systems of stochastic reaction-diffusion equations

Authors: Michael Salins, Konstantinos Spiliopoulos

Abstract: In this paper we develop a metastability theory for a class of stochastic reaction-diffusion equations exposed to small multiplicative noise. We consider the case where the unperturbed reaction-diffusion equation features multiple asymptotically stable equilibria. When the system is exposed to small stochastic perturbations, it is likely to stay near one equilibrium for a long period of time, but… ▽ More In this paper we develop a metastability theory for a class of stochastic reaction-diffusion equations exposed to small multiplicative noise. We consider the case where the unperturbed reaction-diffusion equation features multiple asymptotically stable equilibria. When the system is exposed to small stochastic perturbations, it is likely to stay near one equilibrium for a long period of time, but will eventually transition to the neighborhood of another equilibrium. We are interested in studying the exit time from the full domain of attraction (in a function space) surrounding an equilibrium and therefore do not assume that the domain of attraction features uniform attraction to the equilibrium. This means that the boundary of the domain of attraction is allowed to contain saddles and limit cycles. Our method of proof is purely infinite dimensional, i.e., we do not go through finite dimensional approximations. In addition, we address the multiplicative noise case and we do not impose gradient type of assumptions on the nonlinearity. We prove large deviations logarithmic asymptotics for the exit time and for the exit shape, also characterizing the most probable set of shapes of solutions at the time of exit from the domain of attraction. △ Less

Submitted 15 December, 2020; v1 submitted 14 March, 2019; originally announced March 2019.

MSC Class: 60F10; 60H15; 35R60; 60G40

arXiv:1903.04440 [pdf, other]

Mean Field Analysis of Deep Neural Networks

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training l… ▽ More We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training loss. The ideas that we explore are to (a) take the limits of each hidden layer sequentially and (b) characterize the evolution of parameters in terms of their initialization. The limit satisfies a system of deterministic integro-differential equations. The proof uses methods from weak convergence and stochastic analysis. We show that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum (with zero loss for the objective function). △ Less

Submitted 2 April, 2021; v1 submitted 11 March, 2019; originally announced March 2019.

arXiv:1812.07645 [pdf, other]

Network effects in default clustering for large systems

Authors: Konstantinos Spiliopoulos, Jia Yang

Abstract: We consider a large collection of dynamically interacting components defined on a weighted directed graph determining the impact of default of one component to another one. We prove a law of large numbers for the empirical measure capturing the evolution of the different components in the pool and from this we extract important information for quantities such as the loss rate in the overall pool a… ▽ More We consider a large collection of dynamically interacting components defined on a weighted directed graph determining the impact of default of one component to another one. We prove a law of large numbers for the empirical measure capturing the evolution of the different components in the pool and from this we extract important information for quantities such as the loss rate in the overall pool as well as the mean impact on a given component from system wide defaults. A singular value decomposition of the adjacency matrix of the graph allows to coarse-grain the system by focusing on the highest eigenvalues which also correspond to the components with the highest contagion impact on the pool. Numerical simulations demonstrate the theoretical findings. △ Less

Submitted 3 February, 2020; v1 submitted 18 December, 2018; originally announced December 2018.

arXiv:1812.02127 [pdf, other]

Information geometry for approximate Bayesian computation

Authors: Konstantinos Spiliopoulos

Abstract: The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the threshold parameter and of the size… ▽ More The goal of this paper is to explore the basic Approximate Bayesian Computation (ABC) algorithm via the lens of information theory. ABC is a widely used algorithm in cases where the likelihood of the data is hard to work with or intractable, but one can simulate from it. We use relative entropy ideas to analyze the behavior of the algorithm as a function of the threshold parameter and of the size of the data. Relative entropy here is data driven as it depends on the values of the observed statistics. Relative entropy also allows us to explore the effect of the distance metric and sets up a mathematical framework for sensitivity analysis allowing to find important directions which could lead to lower computational cost of the algorithm for the same level of accuracy. In addition, we also investigate the bias of the estimators for generic observables as a function of both the threshold parameters and the size of the data. Our analysis provides error bounds on performance for positive tolerances and finite sample sizes. Simulation studies complement and illustrate the theoretical results. △ Less

Submitted 12 August, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

arXiv:1808.09372 [pdf, ps, other]

Mean Field Analysis of Neural Networks: A Central Limit Theorem

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussia… ▽ More We rigorously prove a central limit theorem for neural network models with a single hidden layer. The central limit theorem is proven in the asymptotic regime of simultaneously (A) large numbers of hidden units and (B) large numbers of stochastic gradient descent training iterations. Our result describes the neural network's fluctuations around its mean-field limit. The fluctuations have a Gaussian distribution and satisfy a stochastic partial differential equation. The proof relies upon weak convergence methods from stochastic analysis. In particular, we prove relative compactness for the sequence of processes and uniqueness of the limiting process in a suitable Sobolev space. △ Less

Submitted 3 June, 2019; v1 submitted 28 August, 2018; originally announced August 2018.

MSC Class: 60F05; 60G57; 62M45

arXiv:1805.10229 [pdf, ps, other]

Importance sampling for slow-fast diffusions based on moderate deviations

Authors: Matthew R. Morse, Konstantinos Spiliopoulos

Abstract: We consider systems of slow--fast diffusions with small noise in the slow component. We construct provably logarithmic asymptotically optimal importance schemes for the estimation of rare events based on the moderate deviations principle. Using the subsolution approach we construct schemes and identify conditions under which the schemes will be asymptotically optimal. Moderate deviations--based im… ▽ More We consider systems of slow--fast diffusions with small noise in the slow component. We construct provably logarithmic asymptotically optimal importance schemes for the estimation of rare events based on the moderate deviations principle. Using the subsolution approach we construct schemes and identify conditions under which the schemes will be asymptotically optimal. Moderate deviations--based importance sampling offers a viable alternative to large deviations importance sampling when the events are not too rare. In particular, in many cases of interest one can indeed construct the required change of measure in closed form, a task which is more complicated using the large deviations--based importance sampling, especially when it comes to multiscale dynamically evolving processes. The presence of multiple scales and the fact that we do not make any periodicity assumptions for the coefficients driving the processes, complicates the design and the analysis of efficient importance sampling schemes. Simulation studies illustrate the theory. △ Less

Submitted 6 January, 2020; v1 submitted 25 May, 2018; originally announced May 2018.

arXiv:1805.01053 [pdf, other]

Mean Field Analysis of Neural Networks: A Law of Large Numbers

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in engineering, robotics, medicine, and finance. Despite their immense success in practice, there is limited mathematical understanding of ne… ▽ More Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in engineering, robotics, medicine, and finance. Despite their immense success in practice, there is limited mathematical understanding of neural networks. This paper illustrates how neural networks can be studied via stochastic analysis, and develops approaches for addressing some of the technical challenges which arise. We analyze one-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. This result can be considered a law of large numbers for neural networks. In addition, a consequence of our analysis is that the trained parameters of the neural network asymptotically become independent, a property which is commonly called "propagation of chaos". △ Less

Submitted 11 November, 2019; v1 submitted 2 May, 2018; originally announced May 2018.

arXiv:1804.09151 [pdf, other]

Optimal Investment, Demand and Arbitrage under Price Impact

Authors: Michail Anthropelos, Scott Robertson, Konstantinos Spiliopoulos

Abstract: This paper studies the optimal investment problem with random endowment in an inventory-based price impact model with competitive market makers. Our goal is to analyze how price impact affects optimal policies, as well as both pricing rules and demand schedules for contingent claims. For exponential market makers preferences, we establish two effects due to price impact: constrained trading, and n… ▽ More This paper studies the optimal investment problem with random endowment in an inventory-based price impact model with competitive market makers. Our goal is to analyze how price impact affects optimal policies, as well as both pricing rules and demand schedules for contingent claims. For exponential market makers preferences, we establish two effects due to price impact: constrained trading, and non-linear hedging costs. To the former, wealth processes in the impact model are identified with those in a model without impact, but with constrained trading, where the (random) constraint set is generically neither closed nor convex. Regarding hedging, non-linear hedging costs motivate the study of arbitrage free prices for the claim. We provide three such notions, which coincide in the frictionless case, but which dramatically differ in the presence of price impact. Additionally, we show arbitrage opportunities, should they arise from claim prices, can be exploited only for limited position sizes, and may be ignored if outweighed by hedging considerations. We also show that arbitrage inducing prices may arise endogenously in equilibrium, and that equilibrium positions are inversely proportional to the market makers' representative risk aversion. Therefore, large positions endogenously arise in the limit of either market maker risk neutrality, or a large number of market makers. △ Less

Submitted 7 December, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: 1 figure

MSC Class: 91G10; 91B25; 60H30

arXiv:1803.04483 [pdf, other]

Pathwise moderate deviations for option pricing

Authors: Antoine Jacquier, Konstantinos Spiliopoulos

Abstract: We provide a unifying treatment of pathwise moderate deviations for models commonly used in financial applications, and for related integrated functionals. Suitable scaling allows us to transfer these results into small-time, large-time and tail asymptotics for diffusions, as well as for option prices and realised variances. In passing, we highlight some intuitive relationships between moderate de… ▽ More We provide a unifying treatment of pathwise moderate deviations for models commonly used in financial applications, and for related integrated functionals. Suitable scaling allows us to transfer these results into small-time, large-time and tail asymptotics for diffusions, as well as for option prices and realised variances. In passing, we highlight some intuitive relationships between moderate deviations rate functions and their large deviations counterparts; these turn out to be useful for numerical purposes, as large deviations rate functions are often difficult to compute. △ Less

Submitted 1 December, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

arXiv:1710.04273 [pdf, ps, other]

Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation.… ▽ More Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem (CLT) for strongly convex objective functions and, under slightly stronger conditions, for non-convex objective functions as well. An $L^{p}$ convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning. △ Less

Submitted 17 June, 2019; v1 submitted 11 October, 2017; originally announced October 2017.

arXiv:1710.02618 [pdf, ps, other]

Large deviations and averaging for systems of slow--fast stochastic reaction--diffusion equations

Authors: Wenqing Hu, Michael Salins, Konstantinos Spiliopoulos

Abstract: We study a large deviation principle for a system of stochastic reaction--diffusion equations (SRDEs) with a separation of fast and slow components and small noise in the slow component. The derivation of the large deviation principle is based on the weak convergence method in infinite dimensions, which results in studying averaging for controlled SRDEs. By appropriate choice of the parameters, th… ▽ More We study a large deviation principle for a system of stochastic reaction--diffusion equations (SRDEs) with a separation of fast and slow components and small noise in the slow component. The derivation of the large deviation principle is based on the weak convergence method in infinite dimensions, which results in studying averaging for controlled SRDEs. By appropriate choice of the parameters, the fast process and the associated control that arises from the weak convergence method decouple from each other. We show that in this decoupling case one can use the weak convergence method to characterize the limiting process via a "viable pair" that captures the limiting controlled dynamics and the effective invariant measure simultaneously. The characterization of the limit of the controlled slow-fast processes in terms of viable pair enables us to obtain a variational representation of the large deviation action functional. Due to the infinite--dimensional nature of our set--up, the proof of tightness as well as the analysis of the limit process and in particular the proof of the large deviations lower bound is considerably more delicate here than in the finite--dimensional situation. Smoothness properties of optimal controls in infinite dimensions (a necessary step for the large deviations lower bound) need to be established. We emphasize that many issues that are present in the infinite dimensional case, are completely absent in finite dimensions. △ Less

Submitted 30 April, 2019; v1 submitted 6 October, 2017; originally announced October 2017.

MSC Class: 60H15; 60F10; 35K57; 70K70

arXiv:1709.02223 [pdf, other]

Discrete-Time Statistical Inference for Multiscale Diffusions

Authors: Siragan Gailus, Konstantinos Spiliopoulos

Abstract: We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process only. We construct estimators for both averaging and homogenization regimes, based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow process with respect to a function of the time… ▽ More We study statistical inference for small-noise-perturbed multiscale dynamical systems under the assumption that we observe a single time series from the slow process only. We construct estimators for both averaging and homogenization regimes, based on an appropriate misspecified model motivated by a second-order stochastic Taylor expansion of the slow process with respect to a function of the time-scale separation parameter. In the case of a fixed number of observations, we establish consistency, asymptotic normality, and asymptotic statistical efficiency of a minimum contrast estimator (MCE), the limiting variance having been identified explicitly; we furthermore establish consistency and asymptotic normality of a simplified minimum constrast estimator (SMCE), which is however not in general efficient. These results are then extended to the case of high-frequency observations under a condition restricting the rate at which the number of observations may grow vis-à-vis the separation of scales. Numerical simulations illustrate the theoretical results. △ Less

Submitted 11 September, 2018; v1 submitted 7 September, 2017; originally announced September 2017.

arXiv:1708.07469 [pdf, other]

doi 10.1016/j.jcp.2018.08.029

DGM: A deep learning algorithm for solving partial differential equations

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural… ▽ More High-dimensional PDEs have been a longstanding computational challenge. We propose to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions. Our algorithm is meshfree, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, the neural network is trained on batches of randomly sampled time and space points. The algorithm is tested on a class of high-dimensional free boundary PDEs, which we are able to accurately solve in up to $200$ dimensions. The algorithm is also tested on a high-dimensional Hamilton-Jacobi-Bellman PDE and Burgers' equation. The deep learning algorithm approximates the general solution to the Burgers' equation for a continuum of different boundary conditions and physical conditions (which can be viewed as a high-dimensional space). We call the algorithm a "Deep Galerkin Method (DGM)" since it is similar in spirit to Galerkin methods, with the solution approximated by a neural network instead of a linear combination of basis functions. In addition, we prove a theorem regarding the approximation power of neural networks for a class of quasilinear parabolic PDEs. △ Less

Submitted 5 September, 2018; v1 submitted 24 August, 2017; originally announced August 2017.

Comments: Deep learning, machine learning, partial differential equations

arXiv:1707.08868 [pdf, other]

Importance sampling for metastable and multiscale dynamical systems

Authors: Konstantinos Spiliopoulos

Abstract: In this article, we address the issues that come up in the design of importance sampling schemes for rare events associated to stochastic dynamical systems. We focus on the issue of metastability and on the effect of multiple scales. We discuss why seemingly reasonable schemes that follow large deviations optimal paths may perform poorly in practice, even though they are asymptotically optimal. Pr… ▽ More In this article, we address the issues that come up in the design of importance sampling schemes for rare events associated to stochastic dynamical systems. We focus on the issue of metastability and on the effect of multiple scales. We discuss why seemingly reasonable schemes that follow large deviations optimal paths may perform poorly in practice, even though they are asymptotically optimal. Pre-asymptotic optimality is important when one deals with metastable dynamics and we discuss possible ways as to how to address this issue. Moreover, we discuss how the effect of the multiple scales (either in periodic or random environments) on the efficient design of importance sampling should be addressed. We discuss the mathematical and practical issues that come up, how to overcome some of the issues and discuss future challenges. △ Less

Submitted 27 July, 2017; originally announced July 2017.

Comments: Will appear as a chapter in Springer book

arXiv:1702.01777 [pdf, other]

Optimal Scaling of the MALA algorithm with Irreversible Proposals for Gaussian targets

Authors: Michela Ottobre, Natesh S. Pillai, Konstantinos Spiliopoulos

Abstract: It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing… ▽ More It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using time discretisations of such Stochastic Differential Equations (SDEs), one can append the discretization with the usual Metropolis-Hastings accept-reject step and this is often done in practice because the accept--reject step eliminates bias. On the other hand, such a step makes the resulting chain reversible. It is not known whether adding the accept-reject step preserves the faster mixing properties of the non-reversible dynamics. In this paper, we address this gap between theory and practice by analyzing the optimal scaling of MCMC algorithms constructed from proposal moves that are time-step Euler discretisations of an irreversible SDE, for high dimensional Gaussian target measures. We call the resulting algorithm the \imala, in comparison to the classical MALA algorithm (here {\em ip} is for irreversible proposal). In order to quantify how the cost of the algorithm scales with the dimension $N$, we prove invariance principles for the appropriately rescaled chain. In contrast to the usual MALA algorithm, we show that there could be two regimes asymptotically: (i) a diffusive regime, as in the MALA algorithm and (ii) a ``fluid" regime where the limit is an ordinary differential equation. We provide concrete examples where the limit is a diffusion, as in the standard MALA, but with provably higher limiting acceptance probabilities. Numerical results are also given corroborating the theory. △ Less

Submitted 1 July, 2019; v1 submitted 6 February, 2017; originally announced February 2017.

arXiv:1701.04850 [pdf, other]

doi 10.1088/1361-6544/aae936

Selection of quasi-stationary states in the Navier-Stokes equation on the torus

Authors: Margaret Beck, Eric Cooper, Konstantinos Spiliopoulos

Abstract: The two dimensional incompressible Navier-Stokes equation on $D_δ:= [0, 2πδ] \times [0, 2π]$ with $δ\approx 1$, periodic boundary conditions, and viscosity $0 < ν\ll 1$ is considered. Bars and dipoles, two explicitly given quasi-stationary states of the system, evolve on the time scale $\mathcal{O}(e^{-νt})$ and have been shown to play a key role in its long-time evolution. Of particular interest… ▽ More The two dimensional incompressible Navier-Stokes equation on $D_δ:= [0, 2πδ] \times [0, 2π]$ with $δ\approx 1$, periodic boundary conditions, and viscosity $0 < ν\ll 1$ is considered. Bars and dipoles, two explicitly given quasi-stationary states of the system, evolve on the time scale $\mathcal{O}(e^{-νt})$ and have been shown to play a key role in its long-time evolution. Of particular interest is the role that $δ$ plays in selecting which of these two states is observed. Recent numerical studies suggest that, after a transient period of rapid decay of the high Fourier modes, the bar state will be selected if $δ\neq 1$, while the dipole will be selected if $δ= 1$. Our results support this claim and seek to mathematically formalize it. We consider the system in Fourier space, project it onto a center manifold consisting of the lowest eight Fourier modes, and use this as a model to study the selection of bars and dipoles. It is shown for this ODE model that the value of $δ$ controls the behavior of the asymptotic ratio of the low modes, thus determining the likelihood of observing a bar state or dipole after an initial transient period. Moreover, in our model, for all $δ\approx 1$, there is an initial time period in which the high modes decay at the rapid rate $\mathcal{O}(e^{-t/ν})$, while the low modes evolve at the slower $\mathcal{O}(e^{-νt})$ rate. The results for the ODE model are proven using energy estimates and invariant manifolds and further supported by formal asymptotic expansions and numerics. △ Less

Submitted 16 October, 2018; v1 submitted 17 January, 2017; originally announced January 2017.

Comments: 29 pages, 4 figures

Journal ref: Margaret Beck et al 2019 Nonlinearity 32 209

arXiv:1611.05903 [pdf, ps, other]

Moderate deviations principle for systems of slow-fast diffusions

Authors: Matthew R. Morse, Konstantinos Spiliopoulos

Abstract: In this paper, we prove the moderate deviations principle (MDP) for a general system of slow-fast dynamics. We provide a unified approach, based on weak convergence ideas and stochastic control arguments, that cover both the averaging and the homogenization regimes. We allow the coefficients to be in the whole space and not just the torus and allow the noises driving the slow and fast processes to… ▽ More In this paper, we prove the moderate deviations principle (MDP) for a general system of slow-fast dynamics. We provide a unified approach, based on weak convergence ideas and stochastic control arguments, that cover both the averaging and the homogenization regimes. We allow the coefficients to be in the whole space and not just the torus and allow the noises driving the slow and fast processes to be correlated arbitrarily. Similar to the large deviation case, the methodology that we follow allows construction of provably efficient Monte Carlo methods for rare events that fall into the moderate deviations regime. △ Less

Submitted 1 June, 2017; v1 submitted 17 November, 2016; originally announced November 2016.

Comments: 41 pages, added references

arXiv:1611.05545 [pdf, other]

Stochastic Gradient Descent in Continuous Time

Authors: Justin Sirignano, Konstantinos Spiliopoulos

Abstract: Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates… ▽ More Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. SGDCT performs an online parameter update in continuous time, with the parameter updates $θ_t$ satisfying a stochastic differential equation. We prove that $\lim_{t \rightarrow \infty} \nabla \bar g(θ_t) = 0$ where $\bar g$ is a natural objective function for the estimation of the continuous-time dynamics. The convergence proof leverages ergodicity by using an appropriate Poisson equation to help describe the evolution of the parameters for large times. SGDCT can also be used to solve continuous-time optimization problems, such as American options. For certain continuous-time problems, SGDCT has some promising advantages compared to a traditional stochastic gradient descent algorithm. As an example application, SGDCT is combined with a deep neural network to price high-dimensional American options (up to 100 dimensions). △ Less

Submitted 29 October, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

arXiv:1609.04365 [pdf, ps, other]

Rare event simulation via importance sampling for linear SPDE's

Authors: Michael Salins, Konstantinos Spiliopoulos

Abstract: The goal of this paper is to develop provably efficient importance sampling Monte Carlo methods for the estimation of rare events within the class of linear stochastic partial differential equations (SPDEs). We find that if a spectral gap of appropriate size exists, then one can identify a lower dimensional manifold where the rare event takes place. This allows one to build importance sampling cha… ▽ More The goal of this paper is to develop provably efficient importance sampling Monte Carlo methods for the estimation of rare events within the class of linear stochastic partial differential equations (SPDEs). We find that if a spectral gap of appropriate size exists, then one can identify a lower dimensional manifold where the rare event takes place. This allows one to build importance sampling changes of measures that perform provably well even pre-asymptotically (i.e. for small but non-zero size of the noise) without degrading in performance due to infinite dimensionality or due to long simulation time horizons. Simulation studies supplement and illustrate the theoretical results. △ Less

Submitted 4 May, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

MSC Class: 65C05; 60G99; 60F9

arXiv:1607.08287 [pdf, other]

The effect of heterogeneity on flocking behavior and systemic risk

Authors: Fei Fang, Yiwei Sun, Konstantinos Spiliopoulos

Abstract: The goal of this paper is to study organized flocking behavior and systemic risk in heterogeneous mean-field interacting diffusions. We illustrate in a number of case studies the effect of heterogeneity in the behavior of systemic risk in the system, i.e., the risk that several agents default simultaneously as a result of interconnections. We also investigate the effect of heterogeneity on the "fl… ▽ More The goal of this paper is to study organized flocking behavior and systemic risk in heterogeneous mean-field interacting diffusions. We illustrate in a number of case studies the effect of heterogeneity in the behavior of systemic risk in the system, i.e., the risk that several agents default simultaneously as a result of interconnections. We also investigate the effect of heterogeneity on the "flocking behavior" of different agents, i.e., when agents with different dynamics end up following very similar paths and follow closely the mean behavior of the system. Using Laplace asymptotics, we derive an asymptotic formula for the tail of the loss distribution as the number of agents grows to infinity. This characterizes the tail of the loss distribution and the effect of the heterogeneity of the network on the tail loss probability. △ Less

Submitted 8 June, 2017; v1 submitted 27 July, 2016; originally announced July 2016.

arXiv:1607.06158 [pdf, other]

Dimension Reduction in Statistical Estimation of Partially Observed Multiscale Processes

Authors: Andrew Papanicolaou, Konstantinos Spiliopoulos

Abstract: We consider partially observed multiscale diffusion models that are specified up to an unknown vector parameter. We establish for a very general class of test functions that the filter of the original model converges to a filter of reduced dimension. Then, this result is used to justify statistical estimation for the unknown parameters of interest based on the model of reduced dimension but using… ▽ More We consider partially observed multiscale diffusion models that are specified up to an unknown vector parameter. We establish for a very general class of test functions that the filter of the original model converges to a filter of reduced dimension. Then, this result is used to justify statistical estimation for the unknown parameters of interest based on the model of reduced dimension but using the original available data. This allows to learn the unknown parameters of interest while working in lower dimensions, as opposed to working with the original high dimensional system. Simulation studies support and illustrate the theoretical results. △ Less

Submitted 26 November, 2017; v1 submitted 20 July, 2016; originally announced July 2016.

Comments: SIAM Journal of Uncertainty Quantification, 2017

MSC Class: 93E10; 93E11; 93C70; 62M07; 62M86

arXiv:1606.09539 [pdf, ps, other]

Analysis of multiscale integrators for multiple attractors and irreversible Langevin samplers

Authors: Jianfeng Lu, Konstantinos Spiliopoulos

Abstract: We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its li… ▽ More We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its limit and appropriate multiscale integrators can correct for this. We rigorously establish the convergence of the numerical method to the related diffusion on graph, identifying the appropriate choice of discretization parameters. Theoretical results are supplemented by numerical studies on the problem of the recently developing area of introducing irreversibility in Langevin samplers in order to accelerate convergence to equilibrium. △ Less

Submitted 9 October, 2018; v1 submitted 30 June, 2016; originally announced June 2016.

Showing 1–50 of 81 results for author: Spiliopoulos, K