AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Yue, Yun; Ye, Zhiling; Jiang, Jiadi; Liu, Yongchao; Zhang, Ke

Computer Science > Machine Learning

arXiv:2312.01658v1 (cs)

[Submitted on 4 Dec 2023 (this version), latest version 9 Dec 2024 (v2)]

Title:AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Authors:Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang

View PDF HTML (experimental)

Abstract:Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at this https URL.

Comments:	21 pages. Accepted as a conference paper at NeurIPS '23
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Cite as:	arXiv:2312.01658 [cs.LG]
	(or arXiv:2312.01658v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.01658

Submission history

From: Yun Yue [view email]
[v1] Mon, 4 Dec 2023 06:20:14 UTC (12,115 KB)
[v2] Mon, 9 Dec 2024 12:23:59 UTC (12,115 KB)

Computer Science > Machine Learning

Title:AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators