-
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Authors:
Xuxing Chen,
Krishnakumar Balasubramanian,
Promit Ghosal,
Bhavya Agrawalla
Abstract:
We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five…
▽ More
We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Statistical Inference for Linear Functionals of Online SGD in High-dimensional Linear Regression
Authors:
Bhavya Agrawalla,
Krishnakumar Balasubramanian,
Promit Ghosal
Abstract:
Stochastic gradient descent (SGD) has emerged as the quintessential method in a data scientist's toolbox. Using SGD for high-stakes applications requires, however, careful quantification of the associated uncertainty. Towards that end, in this work, we establish a high-dimensional Central Limit Theorem (CLT) for linear functionals of online SGD iterates for overparametrized least-squares regressio…
▽ More
Stochastic gradient descent (SGD) has emerged as the quintessential method in a data scientist's toolbox. Using SGD for high-stakes applications requires, however, careful quantification of the associated uncertainty. Towards that end, in this work, we establish a high-dimensional Central Limit Theorem (CLT) for linear functionals of online SGD iterates for overparametrized least-squares regression with non-isotropic Gaussian inputs. We first show that a bias-corrected CLT holds when the number of iterations of the online SGD, $t$, grows sub-linearly in the dimensionality, $d$. In order to use the developed result in practice, we further develop an online approach for estimating the variance term appearing in the CLT, and establish high-probability bounds for the developed online estimator. Together with the CLT result, this provides a fully online and data-driven way to numerically construct confidence intervals. This enables practical high-dimensional algorithmic inference with SGD and to the best of our knowledge, is the first such result.
△ Less
Submitted 11 March, 2025; v1 submitted 19 February, 2023;
originally announced February 2023.
-
The André-Quillen cohomology of commutative monoids
Authors:
Bhavya Agrawalla,
Nasief Khlaif,
Haynes Miller
Abstract:
We observe that Beck modules for a commutative monoid are exactly modules over a graded commutative ring associated to the monoid. Under this identification, the Quillen cohomology of commutative monoids is a special case of André-Quillen cohomology for graded commutative rings, generalizing a result of Kurdiani and Pirashvili. To verify this we develop the necessary grading formalism. The partial…
▽ More
We observe that Beck modules for a commutative monoid are exactly modules over a graded commutative ring associated to the monoid. Under this identification, the Quillen cohomology of commutative monoids is a special case of André-Quillen cohomology for graded commutative rings, generalizing a result of Kurdiani and Pirashvili. To verify this we develop the necessary grading formalism. The partial cochain complex developed by Pierre Grillet for computing Quillen cohomology appears as the start of a modification of the Harrison cochain complex suggested by Michael Barr.
△ Less
Submitted 3 June, 2024; v1 submitted 2 November, 2022;
originally announced November 2022.