Search | arXiv e-print repository

arXiv:2210.02092 [pdf, ps, other]

Functional Central Limit Theorem and Strong Law of Large Numbers for Stochastic Gradient Langevin Dynamics

Abstract: We study the mixing properties of an important optimization algorithm of machine learning: the stochastic gradient Langevin dynamics (SGLD) with a fixed step size. The data stream is not assumed to be independent hence the SGLD is not a Markov chain, merely a \emph{Markov chain in a random environment}, which complicates the mathematical treatment considerably. We derive a strong law of large numb… ▽ More We study the mixing properties of an important optimization algorithm of machine learning: the stochastic gradient Langevin dynamics (SGLD) with a fixed step size. The data stream is not assumed to be independent hence the SGLD is not a Markov chain, merely a \emph{Markov chain in a random environment}, which complicates the mathematical treatment considerably. We derive a strong law of large numbers and a functional central limit theorem for SGLD. △ Less

Submitted 29 July, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 16 pages

MSC Class: 60J05; 60J20

arXiv:2006.14514 [pdf, other]

Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms

Authors: Attila Lovas, Iosif Lytras, Miklós Rásonyi, Sotirios Sabanis

Abstract: Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning… ▽ More Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs. △ Less

Submitted 15 January, 2023; v1 submitted 25 June, 2020; originally announced June 2020.

arXiv:1903.10328 [pdf, ps, other]

Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Learning

Authors: Huy N. Chau, Miklos Rasonyi

Abstract: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a momentum version of stochastic gradient descent with properly injected Gaussian noise to find a global minimum. In this paper, non-asymptotic convergence analysis of SGHMC is given in the context of non-convex optimization, where subsampling techniques are used over an i.i.d dataset for gradient updates. Our results complement those of [RRT1… ▽ More Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) is a momentum version of stochastic gradient descent with properly injected Gaussian noise to find a global minimum. In this paper, non-asymptotic convergence analysis of SGHMC is given in the context of non-convex optimization, where subsampling techniques are used over an i.i.d dataset for gradient updates. Our results complement those of [RRT17] and improve on those of [GGZ18]. △ Less

Submitted 25 February, 2020; v1 submitted 25 March, 2019; originally announced March 2019.

Showing 1–3 of 3 results for author: Rásonyi, M