Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Morrison, Jacob; Smith, Noah A.; Hajishirzi, Hannaneh; Koh, Pang Wei; Dodge, Jesse; Dasigi, Pradeep

Computer Science > Computation and Language

arXiv:2410.12937 (cs)

[Submitted on 16 Oct 2024]

Title:Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Authors:Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Pang Wei Koh, Jesse Dodge, Pradeep Dasigi

View PDF HTML (experimental)

Abstract:Adapting general-purpose language models to new skills is currently an expensive process that must be repeated as new instruction datasets targeting new skills are created, or can cause the models to forget older skills. In this work, we investigate the effectiveness of adding new skills to preexisting models by training on the new skills in isolation and later merging with the general model (e.g. using task vectors). In experiments focusing on scientific literature understanding, safety, and coding, we find that the parallel-train-then-merge procedure, which is significantly cheaper than retraining the models on updated data mixtures, is often comparably effective. Our experiments also show that parallel training is especially well-suited for enabling safety features in LMs relative to continued finetuning and retraining, as it dramatically improves model compliance with safe prompts while preserving its ability to refuse dangerous or harmful prompts.

Comments:	Findings of EMNLP 2024
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2410.12937 [cs.CL]
	(or arXiv:2410.12937v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.12937

Submission history

From: Jacob Morrison [view email]
[v1] Wed, 16 Oct 2024 18:23:50 UTC (4,933 KB)

Computer Science > Computation and Language

Title:Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators