Decoupled Relative Learning Rate Schedules

Ludziejewski, Jan; Małaśnicki, Jan; Pióro, Maciej; Krutul, Michał; Ciebiera, Kamil; Stefaniak, Maciej; Krajewski, Jakub; Sankowski, Piotr; Cygan, Marek; Adamczewski, Kamil; Jaszczur, Sebastian

Computer Science > Machine Learning

arXiv:2507.03526 (cs)

[Submitted on 4 Jul 2025]

Title:Decoupled Relative Learning Rate Schedules

Authors:Jan Ludziejewski, Jan Małaśnicki, Maciej Pióro, Michał Krutul, Kamil Ciebiera, Maciej Stefaniak, Jakub Krajewski, Piotr Sankowski, Marek Cygan, Kamil Adamczewski, Sebastian Jaszczur

View PDF HTML (experimental)

Abstract:In this work, we introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models. Traditional methods often apply a uniform learning rate across all network layers, potentially overlooking the unique dynamics of each part. Remarkably, our introduced relative learning rates, RLRS, method accelerates the training process by up to $23\%$, particularly in complex models such as Mixture of Experts (MoE). Hyperparameters of RLRS can be efficiently tuned on smaller models and then effectively reused on models up to $27\times$ larger. This simple and effective method results in a substantial reduction in training time and computational resources, offering a practical and scalable solution for optimizing large-scale neural networks.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2507.03526 [cs.LG]
	(or arXiv:2507.03526v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2507.03526

Submission history

From: Jan Małaśnicki [view email]
[v1] Fri, 4 Jul 2025 12:23:45 UTC (168 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2025-07

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Decoupled Relative Learning Rate Schedules

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Decoupled Relative Learning Rate Schedules

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators