Search | arXiv e-print repository

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

Authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng

Abstract: Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines backend power, performance, and area (PPA) analysis wi… ▽ More Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines backend power, performance, and area (PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: This is an extended version of our work titled "Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms" published in MLCAD 2022

arXiv:2306.16767 [pdf, other]

doi 10.1145/3696665

Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations

Authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Sean Kinzer, Susmita Dey Manasi, Sachin S. Sapatnekar, Zhiang Wang

Abstract: Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support fo… ▽ More Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64X64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Journal ref: ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 30, Issue 1, Article No.: 3, Pages 1 - 34, Oct. 2024

arXiv:2105.10554 [pdf, other]

GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching

Authors: Sudipta Mondal, Susmita Dey Manasi, Kishor Kunal, S. Ramprasath, Sachin S. Sapatnekar

Abstract: Graph neural networks (GNN) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to (a) host a variety of GNNs, (b) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns, and (c) maintain load-balanced computation in the face of unev… ▽ More Graph neural networks (GNN) analysis engines are vital for real-world problems that use large graph models. Challenges for a GNN hardware platform include the ability to (a) host a variety of GNNs, (b) handle high sparsity in input vertex feature vectors and the graph adjacency matrix and the accompanying random memory access patterns, and (c) maintain load-balanced computation in the face of uneven workloads, induced by high sparsity and power-law vertex degree distributions. This paper proposes GNNIE, an accelerator designed to run a broad range of GNNs. It tackles workload imbalance by (i)~splitting vertex feature operands into blocks, (ii)~reordering and redistributing computations, (iii)~using a novel flexible MAC architecture. It adopts a graph-specific, degree-aware caching policy that is well suited to real-world graph characteristics. The policy enhances on-chip data reuse and avoids random memory access to DRAM. GNNIE achieves average speedups of 21233x over a CPU and 699x over a GPU over multiple datasets on graph attention networks (GATs), graph convolutional networks (GCNs), GraphSAGE, GINConv, and DiffPool. Compared to prior approaches, GNNIE achieves an average speedup of 35x over HyGCN (which cannot implement GATs) for GCN, GraphSAGE, and GINConv, and, using 3.4x fewer processing units, an average speedup of 2.1x over AWB-GCN (which runs only GCNs). △ Less

Submitted 7 August, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:1905.05011 [pdf, other]

doi 10.1109/TVLSI.2020.2995135

NeuPart: Using Analytical Models to Drive Energy-Efficient Partitioning of CNN Computations on Cloud-Connected Mobile Clients

Authors: Susmita Dey Manasi, Farhana Sharmin Snigdha, Sachin S. Sapatnekar

Abstract: Data processing on convolutional neural networks (CNNs) places a heavy burden on energy-constrained mobile platforms. This work optimizes energy on a mobile client by partitioning CNN computations between in situ processing on the client and offloaded computations in the cloud. A new analytical CNN energy model is formulated, capturing all major components of the in situ computation, for ASIC-base… ▽ More Data processing on convolutional neural networks (CNNs) places a heavy burden on energy-constrained mobile platforms. This work optimizes energy on a mobile client by partitioning CNN computations between in situ processing on the client and offloaded computations in the cloud. A new analytical CNN energy model is formulated, capturing all major components of the in situ computation, for ASIC-based deep learning accelerators. The model is benchmarked against measured silicon data. The analytical framework is used to determine the optimal energy partition point between the client and the cloud at runtime. On standard CNN topologies, partitioned computation is demonstrated to provide significant energy savings on the client over fully cloud-based or fully in situ computation. For example, at 80 Mbps effective bit rate and 0.78 W transmission power, the optimal partition for AlexNet [SqueezeNet] saves up to 52.4% [73.4%] energy over a fully cloud-based computation, and 27.3% [28.8%] energy over a fully in situ computation. △ Less

Submitted 25 June, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: Published in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, April 2020

Journal ref: IEEE Transactions on Very Large Scale Integration Systems (TVLSI), vol. 28, no. 8, pp. 1844-1857, Aug. 2020

arXiv:1610.03902 [pdf, other]

Straintronic magneto-tunneling-junction based ternary content addressable memory

Authors: S. Dey Manasi, M. M. Al Rashid, J. Atulasimha, S. Bandyopadhyay, A. R. Trivedi

Abstract: Straintronic magneto-tunneling junction (s-MTJ) switches, whose resistances are controlled with voltage-generated strain in the magnetostrictive free layer of the MTJ, are extremely energy-efficient switches that would dissipate a few aJ of energy during switching. Unfortunately, they are also relatively error-prone and have low resistance on/off ratio. This suggests that as computing elements, th… ▽ More Straintronic magneto-tunneling junction (s-MTJ) switches, whose resistances are controlled with voltage-generated strain in the magnetostrictive free layer of the MTJ, are extremely energy-efficient switches that would dissipate a few aJ of energy during switching. Unfortunately, they are also relatively error-prone and have low resistance on/off ratio. This suggests that as computing elements, they are best suited for non-Boolean architectures. Here, we propose and analyze a ternary content addressable memory implemented with s-MTJs and some transistors. It overcomes challenges encountered by traditional all-transistor implementations, resulting in exceptionally high cell density. △ Less

Submitted 21 October, 2016; v1 submitted 12 October, 2016; originally announced October 2016.

Comments: 8 pages, 11 figures

Journal ref: Part I: IEEE Transactions on Electron Devices (Volume: 64, Issue: 7, Page(s): 2835-2841, July 2017), Part II: IEEE Transactions on Electron Devices (Volume: 64, Issue: 7, Page(s): 2842-2848, July 2017)

Showing 1–5 of 5 results for author: Manasi, S D