Representation Shift: Unifying Token Compression with FlashAttention

Choi, Joonmyung; Lee, Sanghyeok; Ko, Byungoh; Kim, Eunseo; Kil, Jihyung; Kim, Hyunwoo J.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2508.00367 (cs)

[Submitted on 1 Aug 2025]

Title:Representation Shift: Unifying Token Compression with FlashAttention

Authors:Joonmyung Choi, Sanghyeok Lee, Byungoh Ko, Eunseo Kim, Jihyung Kil, Hyunwoo J. Kim

View PDF HTML (experimental)

Abstract:Transformers have demonstrated remarkable success across vision, language, and video. Yet, increasing task complexity has led to larger models and more tokens, raising the quadratic cost of self-attention and the overhead of GPU memory access. To reduce the computation cost of self-attention, prior work has proposed token compression techniques that drop redundant or less informative tokens. Meanwhile, fused attention kernels such as FlashAttention have been developed to alleviate memory overhead by avoiding attention map construction and its associated I/O to HBM. This, however, makes it incompatible with most training-free token compression methods, which rely on attention maps to determine token importance. Here, we propose Representation Shift, a training-free, model-agnostic metric that measures the degree of change in each token's representation. This seamlessly integrates token compression with FlashAttention, without attention maps or retraining. Our method further generalizes beyond Transformers to CNNs and state space models. Extensive experiments show that Representation Shift enables effective token compression compatible with FlashAttention, yielding significant speedups of up to 5.5% and 4.4% in video-text retrieval and video QA, respectively. Code is available at this https URL.

Comments:	International Conference on Computer Vision (ICCV), 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2508.00367 [cs.CV]
	(or arXiv:2508.00367v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2508.00367

Submission history

From: Sanghyeok Lee [view email]
[v1] Fri, 1 Aug 2025 06:53:55 UTC (2,642 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Representation Shift: Unifying Token Compression with FlashAttention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Representation Shift: Unifying Token Compression with FlashAttention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators