-
Randomised Splitting Methods and Stochastic Gradient Descent
Authors:
Luke Shaw,
Peter A. Whalley
Abstract:
We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric Minibatching Strategy) for stochastic gradient optimisation which shows greatly reduced stochastic gradient bias (from $\mathcal{O}(h^2)$ to $\mathcal{O}(h^4)$ in…
▽ More
We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric Minibatching Strategy) for stochastic gradient optimisation which shows greatly reduced stochastic gradient bias (from $\mathcal{O}(h^2)$ to $\mathcal{O}(h^4)$ in the optimiser stepsize $h$), when combined with momentum-based optimisers. We justify why momentum is needed to obtain the improved performance using the theory of backward analysis for splitting integrators and provide a detailed analytic computation of the stochastic gradient bias on a simple example.
Further, we provide improved convergence guarantees for this new minibatching strategy using Lyapunov techniques that show reduced stochastic gradient bias for a fixed stepsize (or learning rate) over the class of strongly-convex and smooth objective functions. Via the same techniques we also improve the known results for the Random Reshuffling strategy for stochastic gradient descent methods with momentum. We argue that this also leads to a faster convergence rate when considering a decreasing stepsize schedule. Both the reduced bias and efficacy of decreasing stepsizes are demonstrated numerically on several motivating examples.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
Polynomial Inequalities and Optimal Stability of Numerical Integrators
Authors:
Luke Shaw
Abstract:
A numerical integrator for $\dot{x}=f(x)$ is called \emph{stable} if, when applied to the 1D Dahlquist test equation $\dot{x}=λx,λ\in\mathbb{C}$ with fixed timestep $h>0$, the numerical solution remains bounded as the number of steps tends to infinity. It is well known that no explicit integrator may remain stable beyond certain limits in $λ$. Furthermore, these stability limits are only tight for…
▽ More
A numerical integrator for $\dot{x}=f(x)$ is called \emph{stable} if, when applied to the 1D Dahlquist test equation $\dot{x}=λx,λ\in\mathbb{C}$ with fixed timestep $h>0$, the numerical solution remains bounded as the number of steps tends to infinity. It is well known that no explicit integrator may remain stable beyond certain limits in $λ$. Furthermore, these stability limits are only tight for certain specific integrators (different in each case), which may then be called `optimally stable'. Such optimal stability results are typically proven using sophisticated techniques from complex analysis, leading to rather abstruse proofs. In this article, we pursue an alternative approach, exploiting connections with the Bernstein and Markov brothers inequalities for polynomials. This simplifies the proofs greatly and offers a framework which unifies the diverse results that have been obtained.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Random Reshuffling for Stochastic Gradient Langevin Dynamics
Authors:
Luke Shaw,
Peter A. Whalley
Abstract:
We examine the use of different randomisation policies for stochastic gradient algorithms used in sampling, based on first-order (or overdamped) Langevin dynamics, the most popular of which is known as Stochastic Gradient Langevin Dynamics. Conventionally, this algorithm is combined with a specific stochastic gradient strategy, called Robbins-Monro. In this work, we study an alternative strategy,…
▽ More
We examine the use of different randomisation policies for stochastic gradient algorithms used in sampling, based on first-order (or overdamped) Langevin dynamics, the most popular of which is known as Stochastic Gradient Langevin Dynamics. Conventionally, this algorithm is combined with a specific stochastic gradient strategy, called Robbins-Monro. In this work, we study an alternative strategy, Random Reshuffling, and show convincingly that it leads to improved performance via: a) a proof of reduced bias in the Wasserstein metric for strongly convex, gradient Lipschitz potentials; b) an analytical demonstration of reduced bias for a Gaussian model problem; and c) an empirical demonstration of reduced bias in numerical experiments for some logistic regression problems. This is especially important since Random Reshuffling is typically more efficient due to memory access and cache reasons. Such acceleration for the Random Reshuffling policy is familiar from the optimisation literature on stochastic gradient descent.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Generalized extrapolation methods based on compositions of a basic 2nd-order scheme
Authors:
Sergio Blanes,
Fernando Casas,
Luke Shaw
Abstract:
We propose new linear combinations of compositions of a basic second-order scheme with appropriately chosen coefficients to construct higher order numerical integrators for differential equations. They can be considered as a generalization of extrapolation methods and multi-product expansions. A general analysis is provided and new methods up to order 8 are built and tested. The new approach is sh…
▽ More
We propose new linear combinations of compositions of a basic second-order scheme with appropriately chosen coefficients to construct higher order numerical integrators for differential equations. They can be considered as a generalization of extrapolation methods and multi-product expansions. A general analysis is provided and new methods up to order 8 are built and tested. The new approach is shown to reduce the latency problem when implemented in a parallel environment and leads to schemes that are significantly more efficient than standard extrapolation when the linear combination is delayed by a number of steps.
△ Less
Submitted 23 April, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
A New Optimality Property of Strang's Splitting
Authors:
Fernando Casas,
Jesús María Sanz-Serna,
Luke Shaw
Abstract:
For systems of the form $\dot q = M^{-1} p$, $\dot p = -Aq+f(q)$, common in many applications, we analyze splitting integrators based on the (linear/nonlinear) split systems $\dot q = M^{-1} p$, $\dot p = -Aq$ and $\dot q = 0$, $\dot p = f(q)$. We show that the well-known Strang splitting is optimally stable in the sense that, when applied to a relevant model problem, it has a larger stability reg…
▽ More
For systems of the form $\dot q = M^{-1} p$, $\dot p = -Aq+f(q)$, common in many applications, we analyze splitting integrators based on the (linear/nonlinear) split systems $\dot q = M^{-1} p$, $\dot p = -Aq$ and $\dot q = 0$, $\dot p = f(q)$. We show that the well-known Strang splitting is optimally stable in the sense that, when applied to a relevant model problem, it has a larger stability region than alternative integrators. This generalizes a well-known property of the common Störmer/Verlet/leapfrog algorithm, which of course arises from Strang splitting based on the (kinetic/potential) split systems $\dot q = M^{-1} p$, $\dot p = 0$ and $\dot q = 0$, $\dot p = -Aq+f(q)$.
△ Less
Submitted 15 February, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
The logic of planetary combination in Vettius Valens
Authors:
Claire Hall,
Liam P. Shaw
Abstract:
The Anthologies of the second-century astrologer Vettius Valens (120-c.175 CE) is the most extensive surviving practical astrological text from the period. Despite this, the theoretical underpinnings of the Anthologies have been understudied; in general, the work has been overshadowed by Ptolemy's contemporaneous Tetrabiblos. While the Tetrabiblos explicitly aims to present a systematic account of…
▽ More
The Anthologies of the second-century astrologer Vettius Valens (120-c.175 CE) is the most extensive surviving practical astrological text from the period. Despite this, the theoretical underpinnings of the Anthologies have been understudied; in general, the work has been overshadowed by Ptolemy's contemporaneous Tetrabiblos. While the Tetrabiblos explicitly aims to present a systematic account of astrology, Valens' work is often characterised as a miscellaneous collection, of interest to historians only for the evidence it preserves about the practical methods used in casting horoscopes. In this article, we argue that the Anthologies is also an invaluable resource for engagement with the conceptual basis of astrology. As a case study, we take a section of Anthologies Book 1 which lists the possible astrological effects of planets, both alone and in 'combinations' of two and three. We demonstrate that analysing Valens' descriptions quantitatively with textual analysis reveals a consistent internal logic of planetary combination. By classifying descriptive terms as positive or negative, we show that the resulting 'sentiment' of planetary combinations is well-correlated with their component parts. Furthermore, we find that the sentiment of three-planet combinations is more strongly correlated with the average sentiment of their three possible component pairs than with the average sentiment of individual planets, suggesting an iterative combinatorial logic. Recognition of this feature of astrological practice has been neglected compared to the mathematical methods for calculating horoscopes. We argue that this analysis not only provides evidence that the astrological lore detailed in Valens is more consistent than is often assumed, but is also indicative of a wider methodological technique in practical astrology: combinatorial reasoning from existing astrological lore.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Existence and stability of steady states of a reaction convection diffusion equation modeling microtubule formation
Authors:
Shantia Yarahmadian,
Blake Barker,
Kevin Zumbrun,
Sidney L. Shaw
Abstract:
We generalize the Dogterom-Leibler model for microtubule dynamics [DL] to the case where the rates of elongation as well as the lifetimes of the elongating and shortening phases are a function of GTP-tubulin concentration. We study also the effect of nucleation rate in the form of a damping term which leads to new steady-states. For this model, we study existence and stability of steady states sat…
▽ More
We generalize the Dogterom-Leibler model for microtubule dynamics [DL] to the case where the rates of elongation as well as the lifetimes of the elongating and shortening phases are a function of GTP-tubulin concentration. We study also the effect of nucleation rate in the form of a damping term which leads to new steady-states. For this model, we study existence and stability of steady states satisfying the boundary conditions at x = 0. Our stability analysis introduces numerical and analytical Evans function computations as a new mathematical tool in the study of microtubule dynamics.
△ Less
Submitted 11 April, 2010;
originally announced April 2010.