-
Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory (rCiM)
Authors:
Dhandeep Challagundla,
Ignatius Bezzam,
Riadul Islam
Abstract:
While general-purpose computing follows Von Neumann's architecture, the data movement between memory and processor elements dictates the processor's performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affe…
▽ More
While general-purpose computing follows Von Neumann's architecture, the data movement between memory and processor elements dictates the processor's performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figure of merits (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This paper presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for EPFL combinational benchmark circuits on the energy-recycling resonant compute-in-memory (rCiM) architecture designed using TSMC 28 nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared to baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4KB to 192KB.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Descriptor: Face Detection Dataset for Programmable Threshold-Based Sparse-Vision
Authors:
Riadul Islam,
Sri Ranga Sai Krishna Tummala,
Joey Mulé,
Rohith Kankipati,
Suraj Jalapally,
Dhandeep Challagundla,
Chad Howard,
Ryan Robucci
Abstract:
Smart focal-plane and in-chip image processing has emerged as a crucial technology for vision-enabled embedded systems with energy efficiency and privacy. However, the lack of special datasets providing examples of the data that these neuromorphic sensors compute to convey visual information has hindered the adoption of these promising technologies. Neuromorphic imager variants, including event-ba…
▽ More
Smart focal-plane and in-chip image processing has emerged as a crucial technology for vision-enabled embedded systems with energy efficiency and privacy. However, the lack of special datasets providing examples of the data that these neuromorphic sensors compute to convey visual information has hindered the adoption of these promising technologies. Neuromorphic imager variants, including event-based sensors, produce various representations such as streams of pixel addresses representing time and locations of intensity changes in the focal plane, temporal-difference data, data sifted/thresholded by temporal differences, image data after applying spatial transformations, optical flow data, and/or statistical representations. To address the critical barrier to entry, we provide an annotated, temporal-threshold-based vision dataset specifically designed for face detection tasks derived from the same videos used for Aff-Wild2. By offering multiple threshold levels (e.g., 4, 8, 12, and 16), this dataset allows for comprehensive evaluation and optimization of state-of-the-art neural architectures under varying conditions and settings compared to traditional methods. The accompanying tool flow for generating event data from raw videos further enhances accessibility and usability. We anticipate that this resource will significantly support the development of robust vision systems based on smart sensors that can process based on temporal-difference thresholds, enabling more accurate and efficient object detection and localization and ultimately promoting the broader adoption of low-power, neuromorphic imaging technologies. To support further research, we publicly released the dataset at \url{https://dx.doi.org/10.21227/bw2e-dj78}.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Power and Skew Reduction Using Resonant Energy Recycling in 14-nm FinFET Clocks
Authors:
Dhandeep Challagundla,
Mehedi Galib,
Ignatius Bezzam,
Riadul Islam
Abstract:
As the demand for high-performance microprocessors increases, the circuit complexity and the rate of data transfer increases resulting in higher power consumption. We propose a clocking architecture that uses a series LC resonance and inductor matching technique to address this bottleneck. By employing pulsed resonance, the switching power dissipated is recycled back. The inductor matching techniq…
▽ More
As the demand for high-performance microprocessors increases, the circuit complexity and the rate of data transfer increases resulting in higher power consumption. We propose a clocking architecture that uses a series LC resonance and inductor matching technique to address this bottleneck. By employing pulsed resonance, the switching power dissipated is recycled back. The inductor matching technique aids in reducing the skew, increasing the robustness of the clock network. This new resonant architecture saves over 43% power and 91% skew clocking a range of 1--5 GHz, compared to a conventional primary-secondary flip-flop-based CMOS architecture.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.