A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Chakrabarti, Kushal; Baranwal, Mayank

Computer Science > Machine Learning

arXiv:2407.12629 (cs)

[Submitted on 17 Jul 2024]

Title:A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Authors:Kushal Chakrabarti, Mayank Baranwal

View PDF HTML (experimental)

Abstract:Adaptive gradient-descent optimizers are the standard choice for training neural network models. Despite their faster convergence than gradient-descent and remarkable performance in practice, the adaptive optimizers are not as well understood as vanilla gradient-descent. A reason is that the dynamic update of the learning rate that helps in faster convergence of these methods also makes their analysis intricate. Particularly, the simple gradient-descent method converges at a linear rate for a class of optimization problems, whereas the practically faster adaptive gradient methods lack such a theoretical guarantee. The Polyak-Łojasiewicz (PL) inequality is the weakest known class, for which linear convergence of gradient-descent and its momentum variants has been proved. Therefore, in this paper, we prove that AdaGrad and Adam, two well-known adaptive gradient methods, converge linearly when the cost function is smooth and satisfies the PL inequality. Our theoretical framework follows a simple and unified approach, applicable to both batch and stochastic gradients, which can potentially be utilized in analyzing linear convergence of other variants of Adam.

Comments:	Accepted for publication at the main track of 27th European Conference on Artificial Intelligence (ECAI-2024)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2407.12629 [cs.LG]
	(or arXiv:2407.12629v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.12629

Submission history

From: Kushal Chakrabarti [view email]
[v1] Wed, 17 Jul 2024 14:56:21 UTC (28 KB)

Computer Science > Machine Learning

Title:A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Methodology Establishing Linear Convergence of Adaptive Gradient Methods under PL Inequality

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators