Transfer Learning for Finetuning Large Language Models

Strangmann, Tobias; Purucker, Lennart; Franke, Jörg K. H.; Rapant, Ivo; Ferreira, Fabio; Hutter, Frank

Computer Science > Computation and Language

arXiv:2411.01195 (cs)

[Submitted on 2 Nov 2024]

Title:Transfer Learning for Finetuning Large Language Models

Authors:Tobias Strangmann, Lennart Purucker, Jörg K.H. Franke, Ivo Rapant, Fabio Ferreira, Frank Hutter

View PDF HTML (experimental)

Abstract:As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, we investigate transfer learning for finetuning large language models and aim to transfer knowledge about configurations from related finetuning tasks to a new task. In this work, we transfer learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset. Counter-intuitively, we propose to rely only on transfer learning for new datasets. Thus, we do not use task-specific Bayesian optimization but prioritize knowledge transferred from related tasks over task-specific feedback. We evaluate our method on eight synthetic question-answer datasets and a meta-dataset consisting of 1,800 runs of finetuning Microsoft's Phi-3. Our transfer learning is superior to zero-shot, default finetuning, and meta-optimization baselines. Our results demonstrate the transferability of finetuning to adapt large language models more effectively.

Comments:	Accepted at NeurIPS 2024 Workshop on Adaptive Foundation Models
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2411.01195 [cs.CL]
	(or arXiv:2411.01195v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.01195

Submission history

From: Lennart Purucker [view email]
[v1] Sat, 2 Nov 2024 09:43:12 UTC (3,012 KB)

Computer Science > Computation and Language

Title:Transfer Learning for Finetuning Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Transfer Learning for Finetuning Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators