Understanding Adam Requires Better Rotation Dependent Assumptions

Maes, Lucas; Zhang, Tianyue H.; Jolicoeur-Martineau, Alexia; Mitliagkas, Ioannis; Scieur, Damien; Lacoste-Julien, Simon; Guille-Escuret, Charles

Computer Science > Machine Learning

arXiv:2410.19964 (cs)

[Submitted on 25 Oct 2024]

Title:Understanding Adam Requires Better Rotation Dependent Assumptions

Authors:Lucas Maes, Tianyue H. Zhang, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret

View PDF HTML (experimental)

Abstract:Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotations of the parameter space. We demonstrate that Adam's performance in training transformers degrades under random rotations of the parameter space, indicating a crucial sensitivity to the choice of basis. This reveals that conventional rotation-invariant assumptions are insufficient to capture Adam's advantages theoretically. To better understand the rotation-dependent properties that benefit Adam, we also identify structured rotations that preserve or even enhance its empirical performance. We then examine the rotation-dependent assumptions in the literature, evaluating their adequacy in explaining Adam's behavior across various rotation types. This work highlights the need for new, rotation-dependent theoretical frameworks to fully understand Adam's empirical success in modern machine learning tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.19964 [cs.LG]
	(or arXiv:2410.19964v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.19964

Submission history

From: Lucas Maes [view email]
[v1] Fri, 25 Oct 2024 20:53:03 UTC (1,508 KB)

Computer Science > Machine Learning

Title:Understanding Adam Requires Better Rotation Dependent Assumptions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Understanding Adam Requires Better Rotation Dependent Assumptions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators