Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Bhagia, Akshita; Liu, Jiacheng; Wettig, Alexander; Heineman, David; Tafjord, Oyvind; Jha, Ananya Harsh; Soldaini, Luca; Smith, Noah A.; Groeneveld, Dirk; Koh, Pang Wei; Dodge, Jesse; Hajishirzi, Hannaneh

Computer Science > Computation and Language

arXiv:2412.04403 (cs)

[Submitted on 5 Dec 2024]

Title:Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Authors:Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi

View PDF HTML (experimental)

Abstract:We develop task scaling laws and model ladders to predict the individual task performance of pretrained language models (LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accurately model task performance. Therefore, we leverage a two-step prediction approach: first use model and data size to predict a task-specific loss, and then use this task loss to predict task performance. We train a set of small-scale "ladder" models, collect data points to fit the parameterized functions of the two prediction steps, and make predictions for two target models: a 7B model trained to 4T tokens and a 13B model trained to 5T tokens. Training the ladder models only costs 1% of the compute used for the target models. On four multiple-choice tasks written in ranked classification format, we can predict the accuracy of both target models within 2 points of absolute error. We have higher prediction error on four other tasks (average absolute error 6.9) and find that these are often tasks with higher variance in task metrics. We also find that using less compute to train fewer ladder models tends to deteriorate predictions. Finally, we empirically show that our design choices and the two-step approach lead to superior performance in establishing scaling laws.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.04403 [cs.CL]
	(or arXiv:2412.04403v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.04403

Submission history

From: Jiacheng Liu [view email]
[v1] Thu, 5 Dec 2024 18:21:49 UTC (13,264 KB)

Computer Science > Computation and Language

Title:Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Establishing Task Scaling Laws via Compute-Efficient Model Ladders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators