Interpretation of the Transformer and Improvement of the Extractor

Chen, Zhe

Computer Science > Machine Learning

arXiv:2311.12678 (cs)

[Submitted on 21 Nov 2023]

Title:Interpretation of the Transformer and Improvement of the Extractor

Authors:Zhe Chen

View PDF

Abstract:It has been over six years since the Transformer architecture was put forward. Surprisingly, the vanilla Transformer architecture is still widely used today. One reason is that the lack of deep understanding and comprehensive interpretation of the Transformer architecture makes it more challenging to improve the Transformer architecture. In this paper, we first interpret the Transformer architecture comprehensively in plain words based on our understanding and experiences. The interpretations are further proved and verified. These interpretations also cover the Extractor, a family of drop-in replacements for the multi-head self-attention in the Transformer architecture. Then, we propose an improvement on a type of the Extractor that outperforms the self-attention, without introducing additional trainable parameters. Experimental results demonstrate that the improved Extractor performs even better, showing a way to improve the Transformer architecture.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2311.12678 [cs.LG]
	(or arXiv:2311.12678v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.12678

Submission history

From: Zhe Chen [view email]
[v1] Tue, 21 Nov 2023 15:36:20 UTC (502 KB)

Computer Science > Machine Learning

Title:Interpretation of the Transformer and Improvement of the Extractor

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Interpretation of the Transformer and Improvement of the Extractor

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators