Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Wei, Colin; Xie, Sang Michael; Ma, Tengyu

Computer Science > Machine Learning

arXiv:2106.09226 (cs)

[Submitted on 17 Jun 2021 (v1), last revised 20 Apr 2022 (this version, v2)]

Title:Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Authors:Colin Wei, Sang Michael Xie, Tengyu Ma

View PDF

Abstract:Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2106.09226 [cs.LG]
	(or arXiv:2106.09226v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.09226

Submission history

From: Colin Wei [view email]
[v1] Thu, 17 Jun 2021 03:31:47 UTC (477 KB)
[v2] Wed, 20 Apr 2022 22:17:18 UTC (473 KB)

Computer Science > Machine Learning

Title:Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators