Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Hubara, Itay; Nahshan, Yury; Hanani, Yair; Banner, Ron; Soudry, Daniel

Computer Science > Machine Learning

arXiv:2006.10518 (cs)

[Submitted on 14 Jun 2020 (v1), last revised 14 Dec 2020 (this version, v2)]

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Authors:Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry

View PDF

Abstract:Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. Furthermore, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1\% accuracy degradation --- with 4-bit weights and activations in all layers, but the smallest two. We open-sourced our code.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.10518 [cs.LG]
	(or arXiv:2006.10518v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.10518

Submission history

From: Itay Hubara [view email]
[v1] Sun, 14 Jun 2020 16:07:55 UTC (151 KB)
[v2] Mon, 14 Dec 2020 15:55:05 UTC (343 KB)

Computer Science > Machine Learning

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators