Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

Song, Woomin; Jayanthi, Sai Muralidhar; Ronanki, Srikanth; Sathyendra, Kanthashree Mysore; Shin, Jinwoo; Galstyan, Aram; Katiyar, Shubham; Bodapati, Sravan Babu

Computer Science > Computation and Language

arXiv:2506.01215 (cs)

[Submitted on 1 Jun 2025]

Title:Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

Authors:Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati

View PDF HTML (experimental)

Abstract:As large language models increasingly gain popularity in real-world applications, processing extremely long contexts, often exceeding the model's pre-trained context limits, has emerged as a critical challenge. While existing approaches to efficient long-context processing show promise, recurrent compression-based methods struggle with information preservation, whereas random access approaches require substantial memory resources. We introduce REFORM, a novel inference framework that efficiently handles long contexts through a two-phase approach. First, it incrementally processes input chunks while maintaining a compressed KV cache, constructs cross-layer context embeddings, and utilizes early exit strategy for improved efficiency. Second, it identifies and gathers essential tokens via similarity matching and selectively recomputes the KV cache. Compared to baselines, REFORM achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length. It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains. Additionally, REFORM reduces inference time by 30% and peak memory usage by 5%, achieving both efficiency and superior performance.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2506.01215 [cs.CL]
	(or arXiv:2506.01215v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2506.01215

Submission history

From: Woomin Song [view email]
[v1] Sun, 1 Jun 2025 23:49:14 UTC (126 KB)

Computer Science > Computation and Language

Title:Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators