Linearity of Relation Decoding in Transformer Language Models

Hernandez, Evan; Sharma, Arnab Sen; Haklay, Tal; Meng, Kevin; Wattenberg, Martin; Andreas, Jacob; Belinkov, Yonatan; Bau, David

Computer Science > Computation and Language

arXiv:2308.09124 (cs)

[Submitted on 17 Aug 2023 (v1), last revised 15 Feb 2024 (this version, v2)]

Title:Linearity of Relation Decoding in Transformer Language Models

Authors:Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, David Bau

View PDF HTML (experimental)

Abstract:Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations. However, we also identify many cases in which LM predictions capture relational knowledge accurately, but this knowledge is not linearly encoded in their representations. Our results thus reveal a simple, interpretable, but heterogeneously deployed knowledge representation strategy in transformer LMs.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2308.09124 [cs.CL]
	(or arXiv:2308.09124v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.09124

Submission history

From: Arnab Sen Sharma [view email]
[v1] Thu, 17 Aug 2023 17:59:19 UTC (992 KB)
[v2] Thu, 15 Feb 2024 19:12:10 UTC (966 KB)

Computer Science > Computation and Language

Title:Linearity of Relation Decoding in Transformer Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Linearity of Relation Decoding in Transformer Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators