Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Doering, Nigel; Gorlla, Cyril; Tuttle, Trevor; Vijay, Adhvaith

Computer Science > Machine Learning

arXiv:2401.04051 (cs)

[Submitted on 8 Jan 2024]

Title:Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Authors:Nigel Doering, Cyril Gorlla, Trevor Tuttle, Adhvaith Vijay

View PDF

Abstract:Fine-tuning large pre-trained language models for downstream tasks remains a critical challenge in natural language processing. This paper presents an empirical analysis comparing two efficient fine-tuning methods - BitFit and adapter modules - to standard full model fine-tuning. Experiments conducted on GLUE benchmark datasets (MRPC, COLA, STS-B) reveal several key insights. The BitFit approach, which trains only bias terms and task heads, matches full fine-tuning performance across varying amounts of training data and time constraints. It demonstrates remarkable stability even with only 30\% of data, outperforming full fine-tuning at intermediate data levels. Adapter modules exhibit high variability, with inconsistent gains over default models. The findings indicate BitFit offers an attractive balance between performance and parameter efficiency. Our work provides valuable perspectives on model tuning, emphasizing robustness and highlighting BitFit as a promising alternative for resource-constrained or streaming task settings. The analysis offers actionable guidelines for efficient adaptation of large pre-trained models, while illustrating open challenges in stabilizing techniques like adapter modules.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2401.04051 [cs.LG]
	(or arXiv:2401.04051v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.04051

Submission history

From: Cyril Gorlla [view email]
[v1] Mon, 8 Jan 2024 17:44:43 UTC (457 KB)

Computer Science > Machine Learning

Title:Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators