Better Fine-Tuning by Reducing Representational Collapse

Aghajanyan, Armen; Shrivastava, Akshat; Gupta, Anchit; Goyal, Naman; Zettlemoyer, Luke; Gupta, Sonal

Computer Science > Machine Learning

arXiv:2008.03156 (cs)

[Submitted on 6 Aug 2020]

Title:Better Fine-Tuning by Reducing Representational Collapse

Authors:Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta

View PDF

Abstract:Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2008.03156 [cs.LG]
	(or arXiv:2008.03156v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2008.03156

Submission history

From: Armen Aghajanyan [view email]
[v1] Thu, 6 Aug 2020 02:13:16 UTC (426 KB)

Computer Science > Machine Learning

Title:Better Fine-Tuning by Reducing Representational Collapse

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Better Fine-Tuning by Reducing Representational Collapse

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators