The Art of Scaling Reinforcement Learning Compute for LLMs

Khatri, Devvrit; Madaan, Lovish; Tiwari, Rishabh; Bansal, Rachit; Duvvuri, Sai Surya; Zaheer, Manzil; Dhillon, Inderjit S.; Brandfonbrener, David; Agarwal, Rishabh

Computer Science > Machine Learning

arXiv:2510.13786 (cs)

[Submitted on 15 Oct 2025]

Title:The Art of Scaling Reinforcement Learning Compute for LLMs

Authors:Devvrit Khatri, Lovish Madaan, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, David Brandfonbrener, Rishabh Agarwal

View PDF HTML (experimental)

Abstract:Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000 GPU-hours, that defines a principled framework for analyzing and predicting RL scaling in LLMs. We fit sigmoidal compute-performance curves for RL training and ablate a wide range of common design choices to analyze their effects on asymptotic performance and compute efficiency. We observe: (1) Not all recipes yield similar asymptotic performance, (2) Details such as loss aggregation, normalization, curriculum, and off-policy algorithm primarily modulate compute efficiency without materially shifting the asymptote, and (3) Stable, scalable recipes follow predictable scaling trajectories, enabling extrapolation from smaller-scale runs. Combining these insights, we propose a best-practice recipe, ScaleRL, and demonstrate its effectiveness by successfully scaling and predicting validation performance on a single RL run scaled up to 100,000 GPU-hours. Our work provides both a scientific framework for analyzing scaling in RL and a practical recipe that brings RL training closer to the predictability long achieved in pre-training.

Comments:	28 pages, 20 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2510.13786 [cs.LG]
	(or arXiv:2510.13786v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2510.13786

Submission history

From: Lovish Madaan [view email]
[v1] Wed, 15 Oct 2025 17:43:03 UTC (4,240 KB)

Computer Science > Machine Learning

Title:The Art of Scaling Reinforcement Learning Compute for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Art of Scaling Reinforcement Learning Compute for LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators