Need a Small Specialized Language Model? Plan Early!

Grangier, David; Katharopoulos, Angelos; Ablin, Pierre; Hannun, Awni

Computer Science > Machine Learning

arXiv:2402.01093 (cs)

[Submitted on 2 Feb 2024 (v1), last revised 31 Oct 2024 (this version, v2)]

Title:Need a Small Specialized Language Model? Plan Early!

Authors:David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

View PDF HTML (experimental)

Abstract:Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference, but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get good specialized small language models using a large, generic, pretraining set and a limited amount of specialized data. We consider two scenarios, depending on whether (i) one can afford pretraining a model for each specialization task, or (ii) one wants to cheaply adapt a single pretrained model for each task. In the first scenario, we propose an effective solution based on importance sampling: we resample the pretraining set to imitate the specialization data and train a small model on it. In the second scenario, we propose a novel architecture, projected networks (PN). PN is a large network whose parameters can be linearly projected into a small network for specialization. For both scenarios, we demonstrate the empirical effectiveness of our solutions across various domains, training set sizes, and training budgets.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2402.01093 [cs.LG]
	(or arXiv:2402.01093v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.01093

Submission history

From: Pierre Ablin [view email]
[v1] Fri, 2 Feb 2024 01:45:18 UTC (137 KB)
[v2] Thu, 31 Oct 2024 15:56:08 UTC (324 KB)

Computer Science > Machine Learning

Title:Need a Small Specialized Language Model? Plan Early!

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Need a Small Specialized Language Model? Plan Early!

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators