ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Zhao, Weibo; Shi, Yubin; Lyu, Xinyu; Sui, Wanchen; Li, Shen; Li, Yong

Computer Science > Machine Learning

arXiv:2411.07762 (cs)

[Submitted on 12 Nov 2024 (v1), last revised 12 Dec 2024 (this version, v2)]

Title:ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Authors:Weibo Zhao, Yubin Shi, Xinyu Lyu, Wanchen Sui, Shen Li, Yong Li

View PDF HTML (experimental)

Abstract:Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experimental results show that ASER is competitive among the state-of-the-art quantization algorithms, showing potential to activation quantization, with minor overhead.

Comments:	Accepted at AAAI 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.07762 [cs.LG]
	(or arXiv:2411.07762v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.07762

Submission history

From: Yubin Shi [view email]
[v1] Tue, 12 Nov 2024 12:52:04 UTC (901 KB)
[v2] Thu, 12 Dec 2024 02:41:45 UTC (777 KB)

Computer Science > Machine Learning

Title:ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators