Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees

Nguyen, Thien Hang; Nguyen, Huy Le

Computer Science > Machine Learning

arXiv:2411.07120 (cs)

[Submitted on 11 Nov 2024 (v1), last revised 24 May 2025 (this version, v2)]

Title:Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees

Authors:Thien Hang Nguyen, Huy Le Nguyen

View PDF

Abstract:We introduce two complementary techniques for efficient optimization that reduce memory requirements while accelerating training of large-scale neural networks. The first technique, Subset-Norm step size, generalizes AdaGrad-Norm and AdaGrad(-Coordinate) through step-size sharing. Subset-Norm (SN) reduces AdaGrad's memory footprint from $O(d)$ to $O(\sqrt{d})$, where $d$ is the model size. For non-convex smooth objectives under coordinate-wise sub-gaussian noise, we show a noise-adapted high-probability convergence guarantee with improved dimensional dependence of SN over existing methods. Our second technique, Subspace-Momentum, reduces the momentum state's memory footprint by restricting momentum to a low-dimensional subspace while performing SGD in the orthogonal complement. We prove a high-probability convergence result for Subspace-Momentum under standard assumptions. Empirical evaluation on pre-training and fine-tuning LLMs demonstrates the effectiveness of our methods. For instance, combining Subset-Norm with Subspace-Momentum achieves Adam's validation perplexity for LLaMA 1B in approximately half the training tokens (6.8B vs 13.1B) while reducing Adam's optimizer-states memory footprint by more than 80\% with minimal additional hyperparameter tuning.

Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Cite as:	arXiv:2411.07120 [cs.LG]
	(or arXiv:2411.07120v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.07120

Submission history

From: Thien Hang Nguyen [view email]
[v1] Mon, 11 Nov 2024 16:48:07 UTC (1,358 KB)
[v2] Sat, 24 May 2025 15:52:50 UTC (1,971 KB)

Computer Science > Machine Learning

Title:Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Lean and Mean Adaptive Optimization via Subset-Norm and Subspace-Momentum with Convergence Guarantees

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators