MGDA Converges under Generalized Smoothness, Provably

Zhang, Qi; Xiao, Peiyao; Zou, Shaofeng; Ji, Kaiyi

Computer Science > Machine Learning

arXiv:2405.19440 (cs)

[Submitted on 29 May 2024 (v1), last revised 8 Mar 2025 (this version, v5)]

Title:MGDA Converges under Generalized Smoothness, Provably

Authors:Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji

View PDF HTML (experimental)

Abstract:Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized $\ell$-smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only $\mathcal{O}(1)$ time and space, while achieving the same performance guarantee as MGDA.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2405.19440 [cs.LG]
	(or arXiv:2405.19440v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.19440

Submission history

From: Qi Zhang [view email]
[v1] Wed, 29 May 2024 18:36:59 UTC (170 KB)
[v2] Wed, 12 Jun 2024 18:34:36 UTC (171 KB)
[v3] Mon, 1 Jul 2024 14:43:51 UTC (169 KB)
[v4] Wed, 2 Oct 2024 18:51:01 UTC (174 KB)
[v5] Sat, 8 Mar 2025 20:40:28 UTC (214 KB)

Computer Science > Machine Learning

Title:MGDA Converges under Generalized Smoothness, Provably

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MGDA Converges under Generalized Smoothness, Provably

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators