Enhancing Linear Attention with Residual Learning

Lai, Xunhao; Kang, Jialiang; Lu, Jianqiao; Lin, Tong; Zhao, Pengyu

Computer Science > Machine Learning

arXiv:2509.25223 (cs)

[Submitted on 24 Sep 2025]

Title:Enhancing Linear Attention with Residual Learning

Authors:Xunhao Lai, Jialiang Kang, Jianqiao Lu, Tong Lin, Pengyu Zhao

View PDF HTML (experimental)

Abstract:Linear attention offers a linear-time alternative to self-attention but often struggles to capture long-range patterns. We revisit linear attention through a prediction-correction lens and show that prevalent variants can be written as a combination of a historical prediction and a single-token correction, which creates an expressivity bottleneck. To address this bottleneck, we introduce Residual Linear Attention (RLA), a framework that equips linear attention with an explicit residual-fitting mechanism. RLA maintains an auxiliary recurrent state that learns to accumulate residual errors over time and correct the base prediction. We further instantiate a delta-rule version, Residual Delta Net (RDN), incorporating adaptive gating and residual clipping for enhanced correction control and stability. Our implementation leverages highly optimized linear attention kernels and preserves linear time and memory. Across language modeling and recall-intensive evaluations, RLA and RDN consistently outperform their respective baselines and other modern linear-attention methods, narrowing the gap to standard Transformers while retaining linear scaling.

Comments:	15 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.25223 [cs.LG]
	(or arXiv:2509.25223v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2509.25223

Submission history

From: Xunhao Lai [view email]
[v1] Wed, 24 Sep 2025 07:36:08 UTC (206 KB)

Computer Science > Machine Learning

Title:Enhancing Linear Attention with Residual Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Enhancing Linear Attention with Residual Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators