Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Ablin, Pierre; Katharopoulos, Angelos; Seto, Skyler; Grangier, David

Computer Science > Machine Learning

arXiv:2502.01804 (cs)

[Submitted on 3 Feb 2025]

Title:Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Authors:Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David Grangier

View PDF HTML (experimental)

Abstract:Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many different specialist models quickly under a model size constraint.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2502.01804 [cs.LG]
	(or arXiv:2502.01804v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.01804

Submission history

From: Pierre Ablin [view email]
[v1] Mon, 3 Feb 2025 20:33:20 UTC (555 KB)

Computer Science > Machine Learning

Title:Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators