-
SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication
Authors:
Mohammadreza Soltaniyeh,
Richard P. Martin,
Santosh Nagarakatte
Abstract:
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p…
▽ More
This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We propose a novel design for the IM2COL unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. We propose a tall systolic array for the GEMM unit while also providing the ability to organize it as multiple small GEMM units, which enables our design to handle a wide range of CNNs and their parameters. Further, our design improves performance by effectively mapping the sparse data to the hardware units by utilizing sparsity in both input feature maps and weights. Our prototype, SPOTS, is on average 1.74X faster than Eyeriss. It is also 78X, and 12X more energy-efficient when compared to CPU and GPU implementations, respectively.
△ Less
Submitted 24 November, 2021; v1 submitted 28 July, 2021;
originally announced July 2021.
-
Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra
Authors:
Mohammadreza Soltaniyeh,
Richard P. Martin,
Santosh Nagarakatte
Abstract:
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new i…
▽ More
This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new intermediate representation that allows the CPU to communicate the sparse data and the scheduling decisions to the FPGA. The computation is optimized on the FPGA for effective resource utilization with pipelining. REAP improves the performance of Sparse General Matrix Multiplication (SpGEMM) and Sparse Cholesky Factorization by 3.2X and 1.85X compared to widely used sparse libraries for them on the CPU, respectively.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
Internet Protocol Version 6: Dead or Alive?
Authors:
Sumit Maheshwari,
Richard P. Martin
Abstract:
Internet Protocol (IP) is the narrow waist of multilayered Internet protocol stack which defines the rules for data sent across networks. IPv4 is the fourth version of IP and first commercially available for deployment set by ARPANET in 1983 which is a 32 bit long address and can support up to 232 devices. In April 2017, all Regional Internet Registries (RIRs) confirmed that IPv4 addresses are exh…
▽ More
Internet Protocol (IP) is the narrow waist of multilayered Internet protocol stack which defines the rules for data sent across networks. IPv4 is the fourth version of IP and first commercially available for deployment set by ARPANET in 1983 which is a 32 bit long address and can support up to 232 devices. In April 2017, all Regional Internet Registries (RIRs) confirmed that IPv4 addresses are exhausted and cannot be allocated anymore implying any new organization requesting a block of Internet addresses will be allocated IPv6. This creates troubles of interoperability, migration and deployment, and therefore organizations hesitated to use IPv6 borrowing IPv4 addresses from other big organizations instead. Currently, when IPv4 is not available, and IPv6 is not adopted for around 20 years, the question arises whether IPv6 will still be accepted by the computer society or will it have an end of life soon with alternate better protocol such as ID based networks taking its place. This paper claims that IPv6 has lost its deployment window and can be safely skipped when new ID based protocols are available which not only have simple interoperability, deployment and migration guidelines but also provide advanced features as compared to IPv6. The paper provides answers to these questions with a comprehensive comparison of IPv6 with its available alternatives and reasons of IPv6 failures in its adoption. Finally, the paper declares IPv6 as a dead protocol and suggests to use newer available protocols in future.
△ Less
Submitted 17 August, 2018;
originally announced September 2018.