-
AnalogNAS: A Neural Network Design Framework for Accurate Inference with Analog In-Memory Computing
Authors:
Hadjer Benmeziane,
Corey Lammie,
Irem Boybat,
Malte Rasch,
Manuel Le Gallo,
Hsinyu Tsai,
Ramachandran Muralidhar,
Smail Niar,
Ouarnoughi Hamza,
Vijay Narayanan,
Abu Sebastian,
Kaoutar El Maghraoui
Abstract:
The advancement of Deep Learning (DL) is driven by efficient Deep Neural Network (DNN) design and new hardware accelerators. Current DNN design is primarily tailored for general-purpose use and deployment on commercially viable platforms. Inference at the edge requires low latency, compact and power-efficient models, and must be cost-effective. Digital processors based on typical von Neumann archi…
▽ More
The advancement of Deep Learning (DL) is driven by efficient Deep Neural Network (DNN) design and new hardware accelerators. Current DNN design is primarily tailored for general-purpose use and deployment on commercially viable platforms. Inference at the edge requires low latency, compact and power-efficient models, and must be cost-effective. Digital processors based on typical von Neumann architectures are not conducive to edge AI given the large amounts of required data movement in and out of memory. Conversely, analog/mixed signal in-memory computing hardware accelerators can easily transcend the memory wall of von Neuman architectures when accelerating inference workloads. They offer increased area and power efficiency, which are paramount in edge resource-constrained environments. In this paper, we propose AnalogNAS, a framework for automated DNN design targeting deployment on analog In-Memory Computing (IMC) inference accelerators. We conduct extensive hardware simulations to demonstrate the performance of AnalogNAS on State-Of-The-Art (SOTA) models in terms of accuracy and deployment efficiency on various Tiny Machine Learning (TinyML) tasks. We also present experimental results that show AnalogNAS models achieving higher accuracy than SOTA models when implemented on a 64-core IMC chip based on Phase Change Memory (PCM). The AnalogNAS search code is released: https://github.com/IBM/analog-nas
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards
Authors:
Rajeev Muralidhar,
Renata Borovica-Gajic,
Rajkumar Buyya
Abstract:
Computing systems have undergone several inflexion points - while Moore's law guided the semiconductor industry to cram more and more transistors and logic into the same volume, the limits of instruction-level parallelism (ILP) and the end of Dennard's scaling drove the industry towards multi-core chips. We have now entered the era of domain-specific architectures for new workloads like AI and ML.…
▽ More
Computing systems have undergone several inflexion points - while Moore's law guided the semiconductor industry to cram more and more transistors and logic into the same volume, the limits of instruction-level parallelism (ILP) and the end of Dennard's scaling drove the industry towards multi-core chips. We have now entered the era of domain-specific architectures for new workloads like AI and ML. These trends continue, arguably with other limits, along with challenges imposed by tighter integration, extreme form factors and diverse workloads, making systems more complex from an energy efficiency perspective. Many research surveys have covered different aspects of techniques in hardware and microarchitecture across devices, servers, HPC, data center systems along with software, algorithms, frameworks for energy efficiency and thermal management. Somewhat in parallel, the semiconductor industry has developed techniques and standards around specification, modeling and verification of complex chips; these areas have not been addressed in detail by previous research surveys. This survey aims to bring these domains together and is composed of a systematic categorization of key aspects of building energy efficient systems - (a) specification - the ability to precisely specify the power intent or properties at different layers (b) modeling and simulation of the entire system or subsystem (hardware or software or both) so as to be able to perform what-if analysis, (c) techniques used for implementing energy efficiency at different levels of the stack, (d) verification techniques used to provide guarantees that the functionality of complex designs are preserved, and (e) energy efficiency standards and consortiums that aim to standardize different aspects of energy efficiency, including cross-layer optimizations.
△ Less
Submitted 23 March, 2022; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Artificial Intelligence (AI)-Centric Management of Resources in Modern Distributed Computing Systems
Authors:
Shashikant Ilager,
Rajeev Muralidhar,
Rajkumar Buyya
Abstract:
Contemporary Distributed Computing Systems (DCS) such as Cloud Data Centres are large scale, complex, heterogeneous, and distributed across multiple networks and geographical boundaries. On the other hand, the Internet of Things (IoT)-driven applications are producing a huge amount of data that requires real-time processing and fast response. Managing these resources efficiently to provide reliabl…
▽ More
Contemporary Distributed Computing Systems (DCS) such as Cloud Data Centres are large scale, complex, heterogeneous, and distributed across multiple networks and geographical boundaries. On the other hand, the Internet of Things (IoT)-driven applications are producing a huge amount of data that requires real-time processing and fast response. Managing these resources efficiently to provide reliable services to end-users or applications is a challenging task. The existing Resource Management Systems (RMS) rely on either static or heuristic solutions inadequate for such composite and dynamic systems. The advent of Artificial Intelligence (AI) due to data availability and processing capabilities manifested into possibilities of exploring data-driven solutions in RMS tasks that are adaptive, accurate, and efficient. In this regard, this paper aims to draw the motivations and necessities for data-driven solutions in resource management. It identifies the challenges associated with it and outlines the potential future research directions detailing where and how to apply the data-driven techniques in the different RMS tasks. Finally, it provides a conceptual data-driven RMS model for DCS and presents the two real-time use cases (GPU frequency scaling and data centre resource management from Google Cloud and Microsoft Azure) demonstrating AI-centric approaches' feasibility.
△ Less
Submitted 6 November, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs)
Authors:
Shashikant Ilager,
Rajeev Muralidhar,
Kotagiri Rammohanrao,
Rajkumar Buyya
Abstract:
Modern computing paradigms, such as cloud computing, are increasingly adopting GPUs to boost their computing capabilities primarily due to the heterogeneous nature of AI/ML/deep learning workloads. However, the energy consumption of GPUs is a critical problem. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique to reduce the dynamic power of GPUs. Yet, configuring the optimal clock…
▽ More
Modern computing paradigms, such as cloud computing, are increasingly adopting GPUs to boost their computing capabilities primarily due to the heterogeneous nature of AI/ML/deep learning workloads. However, the energy consumption of GPUs is a critical problem. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique to reduce the dynamic power of GPUs. Yet, configuring the optimal clock frequency for essential performance requirements is a non-trivial task due to the complex nonlinear relationship between the application's runtime performance characteristics, energy, and execution time. It becomes more challenging when different applications behave distinctively with similar clock settings. Simple analytical solutions and standard GPU frequency scaling heuristics fail to capture these intricacies and scale the frequencies appropriately. In this regard, we propose a data-driven frequency scaling technique by predicting the power and execution time of a given application over different clock settings. We collect the data from application profiling and train the models to predict the outcome accurately. The proposed solution is generic and can be easily extended to different kinds of workloads and GPU architectures. Furthermore, using this frequency scaling by prediction models, we present a deadline-aware application scheduling algorithm to reduce energy consumption while simultaneously meeting their deadlines. We conduct real extensive experiments on NVIDIA GPUs using several benchmark applications. The experiment results have shown that our prediction models have high accuracy with the average RMSE values of 0.38 and 0.05 for energy and time prediction, respectively. Also, the scheduling algorithm consumes 15.07% less energy as compared to the baseline policies.
△ Less
Submitted 27 April, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.