More Expressive Attention with Negative Weights

Lv, Ang; Xie, Ruobing; Li, Shuaipeng; Liao, Jiayi; Sun, Xingwu; Kang, Zhanhui; Yan, Rui

Computer Science > Computation and Language

arXiv:2411.07176v1 (cs)

[Submitted on 11 Nov 2024 (this version), latest version 30 Jan 2025 (v3)]

Title:More Expressive Attention with Negative Weights

Authors:Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Rui Yan

View PDF HTML (experimental)

Abstract:We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention can shift the token deletion and copying function from a static OV matrix to dynamic QK inner products, with the OV matrix now focusing more on refinement or modification. The attention head can simultaneously delete, copy, or retain tokens by assigning them negative, positive, or minimal attention weights, respectively. As a result, a single attention head becomes more flexible and expressive. (2) Cog Attention improves the model's robustness against representational collapse, which can occur when earlier tokens are over-squashed into later positions, leading to homogeneous representations. Negative weights reduce effective information paths from earlier to later tokens, helping to mitigate this issue. We develop Transformer-like models which use Cog Attention as attention modules, including decoder-only models for language modeling and U-ViT diffusion models for image generation. Experiments show that models using Cog Attention exhibit superior performance compared to those employing traditional softmax attention modules. Our approach suggests a promising research direction for rethinking and breaking the entrenched constraints of traditional softmax attention, such as the requirement for non-negative weights.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2411.07176 [cs.CL]
	(or arXiv:2411.07176v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.07176

Submission history

From: Ang Lv [view email]
[v1] Mon, 11 Nov 2024 17:56:28 UTC (3,137 KB)
[v2] Thu, 14 Nov 2024 08:20:22 UTC (3,137 KB)
[v3] Thu, 30 Jan 2025 18:17:13 UTC (4,385 KB)

Computer Science > Computation and Language

Title:More Expressive Attention with Negative Weights

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:More Expressive Attention with Negative Weights

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators