-
FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning
Authors:
Javier Duarte,
Nhan Tran,
Ben Hawks,
Christian Herwig,
Jules Muhizi,
Shvetank Prakash,
Vijay Janapa Reddi
Abstract:
Applications of machine learning (ML) are growing by the day for many unique and challenging scientific applications. However, a crucial challenge facing these applications is their need for ultra low-latency and on-detector ML capabilities. Given the slowdown in Moore's law and Dennard scaling, coupled with the rapid advances in scientific instrumentation that is resulting in growing data rates,…
▽ More
Applications of machine learning (ML) are growing by the day for many unique and challenging scientific applications. However, a crucial challenge facing these applications is their need for ultra low-latency and on-detector ML capabilities. Given the slowdown in Moore's law and Dennard scaling, coupled with the rapid advances in scientific instrumentation that is resulting in growing data rates, there is a need for ultra-fast ML at the extreme edge. Fast ML at the edge is essential for reducing and filtering scientific data in real-time to accelerate science experimentation and enable more profound insights. To accelerate real-time scientific edge ML hardware and software solutions, we need well-constrained benchmark tasks with enough specifications to be generically applicable and accessible. These benchmarks can guide the design of future edge ML hardware for scientific applications capable of meeting the nanosecond and microsecond level latency requirements. To this end, we present an initial set of scientific ML benchmarks, covering a variety of ML and embedded system techniques.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark
Authors:
Hendrik Borras,
Giuseppe Di Guglielmo,
Javier Duarte,
Nicolò Ghielmetti,
Ben Hawks,
Scott Hauck,
Shih-Chieh Hsu,
Ryan Kastner,
Jason Liang,
Andres Meza,
Jules Muhizi,
Tai Nguyen,
Rushil Roy,
Nhan Tran,
Yaman Umuroglu,
Olivia Weng,
Aidan Yokuda,
Michaela Blott
Abstract:
We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classificatio…
▽ More
We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $μ$s and energy consumption as low as 30 $μ$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
Authors:
Alessandro Pappalardo,
Yaman Umuroglu,
Michaela Blott,
Jovan Mitrevski,
Ben Hawks,
Nhan Tran,
Vladimir Loncar,
Sioni Summers,
Hendrik Borras,
Jules Muhizi,
Matthew Trahms,
Shih-Chieh Hsu,
Scott Hauck,
Javier Duarte
Abstract:
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantiz…
▽ More
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.
△ Less
Submitted 24 June, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.