Text Compression for Efficient Language Generation

Gu, David; Belcak, Peter; Wattenhofer, Roger

Computer Science > Computation and Language

arXiv:2503.11426 (cs)

[Submitted on 14 Mar 2025]

Title:Text Compression for Efficient Language Generation

Authors:David Gu, Peter Belcak, Roger Wattenhofer

View PDF HTML (experimental)

Abstract:We challenge the prevailing assumption that LLMs must rely fully on sub-word tokens for high-quality text generation. To this end, we propose the "Generative Pretrained Thoughtformer" (GPTHF), a hierarchical transformer language model capable of text generation by compressing text into sentence embeddings and employing a sentence attention mechanism. GPTHF retains GPT's architecture, modifying only token interactions via dynamic sparse attention masks.
Our experiments show that GPTHF achieves an up to an order of magnitude improvement in FLOPs efficiency and a threefold increase in runtime speed compared to equally-sized GPT models in the low-size regime. This is achieved through a unique generation method that caches and reuses sentence embeddings, allowing significant portions of the input to bypass large parts of the network.

Comments:	accepted to NAACL SRW 2025
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2503.11426 [cs.CL]
	(or arXiv:2503.11426v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.11426

Submission history

From: David Gu [view email]
[v1] Fri, 14 Mar 2025 14:14:05 UTC (479 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2025-03

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Text Compression for Efficient Language Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Text Compression for Efficient Language Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators