Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

Ren, Yuxin; Collins, Maxwell D; Hu, Miao; Yang, Huanrui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.21535 (cs)

[Submitted on 24 May 2025 (v1), last revised 29 May 2025 (this version, v2)]

Title:Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

Authors:Yuxin Ren, Maxwell D Collins, Miao Hu, Huanrui Yang

View PDF HTML (experimental)

Abstract:While transformers excel across vision and language pretraining tasks, their reliance on attention mechanisms poses challenges for inference efficiency, especially on edge and embedded accelerators with limited parallelism and memory bandwidth. Hinted by the observed redundancy of attention at inference time, we hypothesize that though the model learns complicated token dependency through pretraining, the inference-time sequence-to-sequence mapping in each attention layer is actually ''simple'' enough to be represented with a much cheaper function. In this work, we explore FAR, a Function-preserving Attention Replacement framework that replaces all attention blocks in pretrained transformers with learnable sequence-to-sequence modules, exemplified by an LSTM. FAR optimize a multi-head LSTM architecture with a block-wise distillation objective and a global structural pruning framework to achieve a family of efficient LSTM-based models from pretrained transformers. We validate FAR on the DeiT vision transformer family and demonstrate that it matches the accuracy of the original models on ImageNet and multiple downstream tasks with reduced parameters and latency. Further analysis shows that FAR preserves the semantic token relationships and the token-to-token correlation learned in the transformer's attention module.

Comments:	12 pages main paper + 6 pages appendix, 14 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2505.21535 [cs.CV]
	(or arXiv:2505.21535v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.21535

Submission history

From: Yuxin Ren [view email]
[v1] Sat, 24 May 2025 02:23:46 UTC (5,645 KB)
[v2] Thu, 29 May 2025 02:15:28 UTC (5,645 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators