Model-Preserving Adaptive Rounding

Tseng, Albert; Sun, Zhaofeng; De Sa, Christopher

Computer Science > Machine Learning

arXiv:2505.22988 (cs)

[Submitted on 29 May 2025]

Title:Model-Preserving Adaptive Rounding

Authors:Albert Tseng, Zhaofeng Sun, Christopher De Sa

View PDF HTML (experimental)

Abstract:The main goal of post-training quantization (PTQ) is to produced a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, almost all LLM PTQ algorithms quantize linear layers by independently minimizing the immediate activation error. However, this localized objective ignores the effect of subsequent layers, so reducing it does not necessarily give a closer model. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that uses Kronecker-factored approximations of each linear layer's Hessian with respect to the \textit{full model} KL divergence. YAQA consists of two components: Kronecker-factored sketches of the full layerwise Hessian that can be tractably computed for hundred-billion parameter LLMs, and a quantizer-independent rounding algorithm that uses these sketches and comes with theoretical guarantees. Across a wide range of models and quantizers, YAQA empirically reduces the KL divergence to the original model by $\approx 30\%$ while achieving state of the art performance on downstream tasks.

Comments:	Preprint
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2505.22988 [cs.LG]
	(or arXiv:2505.22988v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2505.22988

Submission history

From: Albert Tseng [view email]
[v1] Thu, 29 May 2025 01:53:00 UTC (92 KB)

Computer Science > Machine Learning

Title:Model-Preserving Adaptive Rounding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model-Preserving Adaptive Rounding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators