Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs
Authors:
Shivam Aggarwal,
Hans Jakob Damsgaard,
Alessandro Pappalardo,
Giuseppe Franco,
Thomas B. Preußer,
Michaela Blott,
Tulika Mitra
Abstract:
Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware c…
▽ More
Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point formats(FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware cost with integers remains unexplored on FPGAs. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. We implement a custom FPGA-based multiply-accumulate operator library and explore the vast design space, comparing minifloat and integer representations across 3 to 8 bits for both weights and activations. We also examine the applicability of various integerbased quantization techniques to minifloats. Our experiments show that minifloats offer a promising alternative for emerging workloads such as vision transformers.
△ Less
Submitted 5 July, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
Open-Source Verification with Chisel and Scala
Authors:
Andrew Dobis,
Tjark Petersen,
Kasper Juul Hesse Rasmussen,
Enrico Tolotto,
Hans Jakob Damsgaard,
Simon Thye Andersen,
Richard Lin,
Martin Schoeberl
Abstract:
Performance increase with general-purpose processors has come to a halt. We can no longer depend on Moore's Law to increase computing performance. The only way to achieve higher performance or lower energy consumption is by building domain-specific hardware accelerators. To efficiently design and verify those domain-specific accelerators, we need agile hardware development. One of the main obstacl…
▽ More
Performance increase with general-purpose processors has come to a halt. We can no longer depend on Moore's Law to increase computing performance. The only way to achieve higher performance or lower energy consumption is by building domain-specific hardware accelerators. To efficiently design and verify those domain-specific accelerators, we need agile hardware development. One of the main obstacles when proposing such a modern method is the lack of modern tools to attack it. To be able to verify a design in such a time-constrained development method, one needs to have efficient tools both for design and verification.
This paper thus proposes ChiselVerify, an open-source tool for verifying circuits described in any Hardware Description Language. It builds on top of the Chisel hardware construction language and uses Scala to drive the verification using a testing strategy inspired by the Universal Verification Methodology (UVM) and adapted for designs described in Chisel. ChiselVerify is created based on three key ideas. First, our solution highly increases the productivity of the verification engineer, by allowing hardware testing to be done in a modern high-level programming environment. Second, the framework functions with any hardware description language thanks to the flexibility of Chisel blackboxes. Finally, the solution is well integrated into the existing Chisel universe, making it an extension of currently existing testing libraries.
We implement ChiselVerify in a way inspired by the functionalities found in SystemVerilog. This allows one to use functional coverage, constrained-random verification, bus functional models, transaction-level modeling and much more during the verification process of a design in a contemporary high-level programming ecosystem.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.