Convergence of Markov Chains for Constant Step-size Stochastic Gradient Descent with Separable Functions
Authors:
David Shirokoff,
Philip Zaleski
Abstract:
Stochastic gradient descent (SGD) is a popular algorithm for minimizing objective functions that arise in machine learning. For constant step-sized SGD, the iterates form a Markov chain on a general state space. Focusing on a class of separable (non-convex) objective functions, we establish a "Doeblin-type decomposition," in that the state space decomposes into a uniformly transient set and a disj…
▽ More
Stochastic gradient descent (SGD) is a popular algorithm for minimizing objective functions that arise in machine learning. For constant step-sized SGD, the iterates form a Markov chain on a general state space. Focusing on a class of separable (non-convex) objective functions, we establish a "Doeblin-type decomposition," in that the state space decomposes into a uniformly transient set and a disjoint union of absorbing sets. Each of the absorbing sets contains a unique invariant measure, with the set of all invariant measures being the convex hull. Moreover the set of invariant measures are shown to be global attractors to the Markov chain with a geometric convergence rate. The theory is highlighted with examples that show: (1) the failure of the diffusion approximation to characterize the long-time dynamics of SGD; (2) the global minimum of an objective function may lie outside the support of the invariant measures (i.e., even if initialized at the global minimum, SGD iterates will leave); and (3) bifurcations may enable the SGD iterates to transition between two local minima. Key ingredients in the theory involve viewing the SGD dynamics as a monotone iterated function system and establishing a "splitting condition" of Dubins and Freedman 1966 and Bhattacharya and Lee 1988.
△ Less
Submitted 24 March, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
A variational model of charged drops in dielectrically matched binary fluids: the effect of charge discreteness
Authors:
Cyrill B. Muratov,
Matteo Novaga,
Philip Zaleski
Abstract:
This paper addresses the ill-posedness of the classical Rayleigh variational model of conducting charged liquid drops by incorporating the discreteness of the elementary charges. Introducing the model that describes two immiscible fluids with the same dielectric constant, with a drop of one fluid containing a fixed number of elementary charges together with their solvation spheres, we interpret th…
▽ More
This paper addresses the ill-posedness of the classical Rayleigh variational model of conducting charged liquid drops by incorporating the discreteness of the elementary charges. Introducing the model that describes two immiscible fluids with the same dielectric constant, with a drop of one fluid containing a fixed number of elementary charges together with their solvation spheres, we interpret the equilibrium shape of the drop as a global minimizer of the sum of its surface energy and the electrostatic repulsive energy between the charges under fixed drop volume. For all model parameters, we establish existence of generalized minimizers that consist of at most a finite number of components ``at infinity''. We also give several existence and non-existence results for classical minimizers consisting of only a single component. In particular, we identify an asymptotically sharp threshold for the number of charges to yield existence of minimizers in a regime corresponding to macroscopically large drops containing a large number of charges. The obtained non-trivial threshold is significantly below the corresponding threshold for the Rayleigh model, consistently with the ill-posedness of the latter and demonstrating a particular regularizing effect of the charge discreteness. However, when a minimizer does exist in this regime, it approaches a ball with the charge uniformly distributed on the surface as the number of charges goes to infinity, just as in the Rayleigh model. Finally, we provide an explicit solution for the problem with two charges and a macroscopically large drop.
△ Less
Submitted 17 June, 2024; v1 submitted 9 March, 2023;
originally announced March 2023.