-
Characterization and Mitigation of ADC Noise by Reference Tuning in RRAM-Based Compute-In-Memory
Authors:
Ying-Hao Wei,
Zishen Wan,
Brian Crafton,
Samuel Spetalnick,
Arijit Raychowdhury
Abstract:
With the escalating demand for power-efficient neural network architectures, non-volatile compute-in-memory designs have garnered significant attention. However, owing to the nature of analog computation, susceptibility to noise remains a critical concern. This study confronts this challenge by introducing a detailed model that incorporates noise factors arising from both ADCs and RRAM devices. Th…
▽ More
With the escalating demand for power-efficient neural network architectures, non-volatile compute-in-memory designs have garnered significant attention. However, owing to the nature of analog computation, susceptibility to noise remains a critical concern. This study confronts this challenge by introducing a detailed model that incorporates noise factors arising from both ADCs and RRAM devices. The experimental data is derived from a 40nm foundry RRAM test-chip, wherein different reference voltage configurations are applied, each tailored to its respective module. The mean and standard deviation values of HRS and LRS cells are derived through a randomized vector, forming the foundation for noise simulation within our analytical framework. Additionally, the study examines the read-disturb effects, shedding light on the potential for accuracy deterioration in neural networks due to extended exposure to high-voltage stress. This phenomenon is mitigated through the proposed low-voltage read mode. Leveraging our derived comprehensive fault model from the RRAM test-chip, we evaluate CIM noise impact on both supervised learning (time-independent) and reinforcement learning (time-dependent) tasks, and demonstrate the effectiveness of reference tuning to mitigate noise impacts.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics
Authors:
Brian Crafton,
Samuel Spetalnick,
Gauthaman Murali,
Tushar Krishna,
Sung-Kyu Lim,
Arijit Raychowdhury
Abstract:
Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM)…
▽ More
Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM) or phase change random access memory (PCRAM), various forms of neural networks can be implemented to greatly reduce power and increase on chip memory capacity. However, compute in-memory faces its own limitations at both the circuit and the device levels. Although compute in-memory using the crossbar architecture can greatly reduce data transport, the rigid nature of these large fixed weight matrices forfeits the flexibility of traditional CMOS and SRAM based designs. In this work, we explore the different synchronization barriers that occur from the CIM constraints. Furthermore, we propose a new allocation algorithm and data flow based on input data distributions to maximize utilization and performance for compute-in memory based designs. We demonstrate a 7.47$\times$ performance improvement over a naive allocation method for CIM accelerators on ResNet18.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Counting Cards: Exploiting Variance and Data Distributions for Robust Compute In-Memory
Authors:
Brian Crafton,
Samuel Spetalnick,
Arijit Raychowdhury
Abstract:
Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM)…
▽ More
Compute in-memory (CIM) is a promising technique that minimizes data transport, the primary performance bottleneck and energy cost of most data intensive applications. This has found wide-spread adoption in accelerating neural networks for machine learning applications. Utilizing a crossbar architecture with emerging non-volatile memories (eNVM) such as dense resistive random access memory (RRAM) or phase change random access memory (PCRAM), various forms of neural networks can be implemented to greatly reduce power and increase on chip memory capacity. However, compute in-memory faces its own limitations at both the circuit and the device levels. In this work, we explore the impact of device variation and peripheral circuit design constraints. Furthermore, we propose a new algorithm based on device variance and neural network weight distributions to increase both performance and accuracy for compute-in memory based designs. We demonstrate a 27% power improvement and 23% performance improvement for low and high variance eNVM, while satisfying a programmable threshold for a target error tolerance, which depends on the application.
△ Less
Submitted 13 February, 2021; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Hardware-aware Pruning of DNNs using LFSR-Generated Pseudo-Random Indices
Authors:
Foroozan Karimzadeh,
Ningyuan Cao,
Brian Crafton,
Justin Romberg,
Arijit Raychowdhury
Abstract:
Deep neural networks (DNNs) have been emerged as the state-of-the-art algorithms in broad range of applications. To reduce the memory foot-print of DNNs, in particular for embedded applications, sparsification techniques have been proposed. Unfortunately, these techniques come with a large hardware overhead. In this paper, we present a hardware-aware pruning method where the locations of non-zero…
▽ More
Deep neural networks (DNNs) have been emerged as the state-of-the-art algorithms in broad range of applications. To reduce the memory foot-print of DNNs, in particular for embedded applications, sparsification techniques have been proposed. Unfortunately, these techniques come with a large hardware overhead. In this paper, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a Linear Feedback Shift Registers (LFSRs). Using the proposed method, we demonstrate a total saving of energy and area up to 63.96% and 64.23% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression-rate and iso-accuracy.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Direct Feedback Alignment with Sparse Connections for Local Learning
Authors:
Brian Crafton,
Abhinav Parihar,
Evan Gebhardt,
Arijit Raychowdhury
Abstract:
Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron's dependence on the weights and errors located deeper in the network require…
▽ More
Recent advances in deep neural networks (DNNs) owe their success to training algorithms that use backpropagation and gradient-descent. Backpropagation, while highly effective on von Neumann architectures, becomes inefficient when scaling to large networks. Commonly referred to as the weight transport problem, each neuron's dependence on the weights and errors located deeper in the network require exhaustive data movement which presents a key problem in enhancing the performance and energy-efficiency of machine-learning hardware. In this work, we propose a bio-plausible alternative to backpropagation drawing from advances in feedback alignment algorithms in which the error computation at a single synapse reduces to the product of three scalar values. Using a sparse feedback matrix, we show that a neuron needs only a fraction of the information previously used by the feedback alignment algorithms. Consequently, memory and compute can be partitioned and distributed whichever way produces the most efficient forward pass so long as a single error can be delivered to each neuron. Our results show orders of magnitude improvement in data movement and $2\times$ improvement in multiply-and-accumulate operations over backpropagation. Like previous work, we observe that any variant of feedback alignment suffers significant losses in classification accuracy on deep convolutional neural networks. By transferring trained convolutional layers and training the fully connected layers using direct feedback alignment, we demonstrate that direct feedback alignment can obtain results competitive with backpropagation. Furthermore, we observe that using an extremely sparse feedback matrix, rather than a dense one, results in a small accuracy drop while yielding hardware advantages. All the code and results are available under https://github.com/bcrafton/ssdfa.
△ Less
Submitted 9 May, 2019; v1 submitted 30 January, 2019;
originally announced March 2019.