-
Quantum message-passing algorithm for optimal and efficient decoding
Authors:
Christophe Piveteau,
Joseph M. Renes
Abstract:
Recently, Renes proposed a quantum algorithm called belief propagation with quantum messages (BPQM) for decoding classical data encoded using a binary linear code with tree Tanner graph that is transmitted over a pure-state CQ channel [Renes, NJP 19 072001 (2017)]. The algorithm presents a genuine quantum counterpart to decoding based on the classical belief propagation algorithm, which has found…
▽ More
Recently, Renes proposed a quantum algorithm called belief propagation with quantum messages (BPQM) for decoding classical data encoded using a binary linear code with tree Tanner graph that is transmitted over a pure-state CQ channel [Renes, NJP 19 072001 (2017)]. The algorithm presents a genuine quantum counterpart to decoding based on the classical belief propagation algorithm, which has found wide success in classical coding theory when used in conjunction with LDPC or Turbo codes. More recently Rengaswamy et al. [npj Quantum Information 7 97 (2021)] observed that BPQM implements the optimal decoder on a small example code. Here we significantly expand the understanding, formalism, and applicability of the BPQM algorithm with the following contributions. First, we prove analytically that BPQM realizes optimal decoding for any binary linear code with tree Tanner graph. We also provide the first formal description of the BPQM algorithm in full detail and without any ambiguity. In so doing, we identify a key flaw overlooked in the original algorithm and subsequent works which implies quantum circuit realizations will be exponentially large in the code dimension. Although BPQM passes quantum messages, other information required by the algorithm is processed globally. We remedy this problem by formulating a truly message-passing algorithm which approximates BPQM and has quantum circuit complexity $\mathcal{O}(\text{poly } n, \text{polylog } \frac{1}ε)$, where $n$ is the code length and $ε$ is the approximation error. Finally, we also propose a novel method for extending BPQM to factor graphs containing cycles by making use of approximate cloning. We show some promising numerical results that indicate that BPQM on factor graphs with cycles can significantly outperform the best possible classical decoder.
△ Less
Submitted 10 April, 2024; v1 submitted 16 September, 2021;
originally announced September 2021.
-
ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for Deep Learning
Authors:
Vinay Joshi,
Geethan Karunaratne,
Manuel Le Gallo,
Irem Boybat,
Christophe Piveteau,
Abu Sebastian,
Bipin Rajendran,
Evangelos Eleftheriou
Abstract:
Deep neural networks (DNNs) have surpassed human-level accuracy in a variety of cognitive tasks but at the cost of significant memory/time requirements in DNN training. This limits their deployment in energy and memory limited applications that require real-time learning. Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated wit…
▽ More
Deep neural networks (DNNs) have surpassed human-level accuracy in a variety of cognitive tasks but at the cost of significant memory/time requirements in DNN training. This limits their deployment in energy and memory limited applications that require real-time learning. Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of DNNs. Strategies to improve the efficiency of MVM computation in hardware have been demonstrated with minimal impact on training accuracy. However, the VVOP computation remains a relatively less explored bottleneck even with the aforementioned strategies. Stochastic computing (SC) has been proposed to improve the efficiency of VVOP computation but on relatively shallow networks with bounded activation functions and floating-point (FP) scaling of activation gradients. In this paper, we propose ESSOP, an efficient and scalable stochastic outer product architecture based on the SC paradigm. We introduce efficient techniques to generalize SC for weight update computation in DNNs with the unbounded activation functions (e.g., ReLU), required by many state-of-the-art networks. Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling. We show that the ResNet-32 network with 33 convolution layers and a fully-connected layer can be trained with ESSOP on the CIFAR-10 dataset to achieve baseline comparable accuracy. Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier design, ESSOP is 82.2% and 93.7% better in energy and area efficiency respectively for outer product computation.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Mixed-precision deep learning based on computational memory
Authors:
S. R. Nandakumar,
Manuel Le Gallo,
Christophe Piveteau,
Vinay Joshi,
Giovanni Mariani,
Irem Boybat,
Geethan Karunaratne,
Riduan Khaddam-Aljameh,
Urs Egger,
Anastasios Petropoulos,
Theodore Antonakopoulos,
Bipin Rajendran,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory…
▽ More
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 173x improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
Accurate deep neural network inference using computational phase-change memory
Authors:
Vinay Joshi,
Manuel Le Gallo,
Simon Haefeli,
Irem Boybat,
S. R. Nandakumar,
Christophe Piveteau,
Martino Dazzi,
Bipin Rajendran,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific w…
▽ More
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.
△ Less
Submitted 11 April, 2020; v1 submitted 7 June, 2019;
originally announced June 2019.