Showing 1–1 of 1 results for author: Silveti-Falls, T
-
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Authors:
Thomas Pethick,
Wanyun Xie,
Mete Erdogan,
Kimon Antonakopoulos,
Tony Silveti-Falls,
Volkan Cevher
Abstract:
This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of ($L_0$,$L_1$)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-…
▽ More
This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of ($L_0$,$L_1$)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal $O(n^{-1/4})$ convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.