FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Tian, Jiayi; Solgi, Ryan; Lu, Jinming; Yang, Yifan; Li, Hai; Zhang, Zheng

Computer Science > Computation and Language

arXiv:2505.23966 (cs)

[Submitted on 29 May 2025]

Title:FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Authors:Jiayi Tian, Ryan Solgi, Jinming Lu, Yifan Yang, Hai Li, Zheng Zhang

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments. Although recent low-rank decomposition methods offer a promising path for structural compression, they often suffer from accuracy degradation, expensive calibration procedures, and result in inefficient model architectures that hinder real-world inference speedups. In this paper, we propose FLAT-LLM, a fast and accurate, training-free structural compression method based on fine-grained low-rank transformations in the activation space. Specifically, we reduce the hidden dimension by transforming the weights using truncated eigenvectors computed via head-wise Principal Component Analysis (PCA), and employ an importance-based metric to adaptively allocate ranks across decoders. FLAT-LLM achieves efficient and effective weight compression without recovery fine-tuning, which could complete the calibration within a few minutes. Evaluated across 4 models and 11 datasets, FLAT-LLM outperforms structural pruning baselines in generalization and downstream performance, while delivering inference speedups over decomposition-based methods.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2505.23966 [cs.CL]
	(or arXiv:2505.23966v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.23966

Submission history

From: Jiayi Tian [view email]
[v1] Thu, 29 May 2025 19:42:35 UTC (441 KB)

Computer Science > Computation and Language

Title:FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators