Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Deiseroth, Björn; Meuer, Max; Gritsch, Nikolas; Eichenberg, Constantin; Schramowski, Patrick; Aßenmacher, Matthias; Kersting, Kristian

Computer Science > Computation and Language

arXiv:2311.01544v1 (cs)

[Submitted on 2 Nov 2023 (this version), latest version 3 Apr 2024 (v3)]

Title:Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Authors:Björn Deiseroth, Max Meuer, Nikolas Gritsch, Constantin Eichenberg, Patrick Schramowski, Matthias Aßenmacher, Kristian Kersting

View PDF

Abstract:Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. Their ever-increasing size, however, raised concerns about their effective deployment and the need for LLM compressions. This study introduces the Divergent Token metrics (DTMs), a novel approach for assessing compressed LLMs, addressing the limitations of traditional measures like perplexity that fail to accurately reflect text generation quality. DTMs focus on token divergence, providing deeper insights into the subtleties of model compression. Our results indicate that significant levels of precision and sparsity can be achieved without compromising text generation quality. Moreover, DTMs offers a more precise evaluation of each component's impact individually. Utilizing the First Divergent Token metric (FDTM) in model sparsification reveals that nearly 20% of all components can be pruned over 90%. In terms of quantization, the FDTM suggests that over 80% of parameters can be straightforwardly transformed to int8 without special outlier management.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2311.01544 [cs.CL]
	(or arXiv:2311.01544v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.01544

Submission history

From: Björn Deiseroth [view email]
[v1] Thu, 2 Nov 2023 18:55:53 UTC (3,757 KB)
[v2] Mon, 13 Nov 2023 15:33:35 UTC (2,146 KB)
[v3] Wed, 3 Apr 2024 11:49:53 UTC (2,776 KB)

Computer Science > Computation and Language

Title:Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators