-
Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments
Authors:
Maciej Besta,
Robert Gerstenberger,
Patrick Iff,
Pournima Sonawane,
Juan Gómez Luna,
Raghavendra Kanakagiri,
Rui Min,
Grzegorz Kwaśniewski,
Onur Mutlu,
Torsten Hoefler,
Raja Appuswamy,
Aidan O Mahony
Abstract:
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to p…
▽ More
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.
△ Less
Submitted 19 November, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
CMOSS: A Reliable, Motif-based Columnar Molecular Storage System
Authors:
Eugenio Marinelli,
Yiqing Yan,
Virginie Magnone,
Pascal Barbry,
Raja Appuswamy
Abstract:
The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Despite its benefits, storing data on DNA poses several challenges as the technology used for reading/writing data and achieving random access on DNA are highly error prone. In order to deal…
▽ More
The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Despite its benefits, storing data on DNA poses several challenges as the technology used for reading/writing data and achieving random access on DNA are highly error prone. In order to deal with such errors, it is important to design efficient pipelines that can carefully use redundancy to mask errors without amplifying overall cost. In this work, we present Columnar MOlecular Storage System (CMOSS), a novel, end-to-end DNA storage pipeline that can provide error-tolerant data storage at low read/write costs. CMOSS differs from SOTA on three fronts (i) a motif-based, vertical layout in contrast to nucleotide-based horizontal layout used by SOTA, (ii) merged consensus calling and decoding enabled by the vertical layout, and (iii) a flexible, fixed-size, block-based data organization for random access over DNA storage in contrast to the variable-sized, object-based access used by SOTA. Using an in-depth evaluation via simulation studies and real wet-lab experiments, we demonstrate the benefits of various CMOSS design choices. We make the entire pipeline together with the read datasets openly available to the community for faithful reproduction and furthering research.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Authors:
Deepika Bablani,
Jeffrey L. Mckinstry,
Steven K. Esser,
Rathinakumar Appuswamy,
Dharmendra S. Modha
Abstract:
For efficient neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for simplifying networks. As each layer of a network may have different sensitivity to quantization, mixed precision quantization methods selectively tune the precis…
▽ More
For efficient neural network inference, it is desirable to achieve state-of-the-art accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for simplifying networks. As each layer of a network may have different sensitivity to quantization, mixed precision quantization methods selectively tune the precision of individual layers to achieve a minimum drop in task performance (e.g., accuracy). To estimate the impact of layer precision choice on task performance, two methods are introduced: i) Entropy Approximation Guided Layer selection (EAGL) is fast and uses the entropy of the weight distribution, and ii) Accuracy-aware Layer Precision Selection (ALPS) is straightforward and relies on single epoch fine-tuning after layer precision reduction. Using EAGL and ALPS for layer precision selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit layers for ResNet-50, ResNet-101 and BERT-base transformer networks, demonstrating enhanced performance across the entire accuracy-throughput frontier. The techniques demonstrate better performance than existing techniques in several commensurate comparisons. Notably, this is accomplished with significantly lesser computational time required to reach a solution.
△ Less
Submitted 10 January, 2024; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Universal Layout Emulation for Long-Term Database Archival
Authors:
Raja Appuswamy,
Vincent Joguin
Abstract:
Research on alternate media technologies, like film, synthetic DNA, and glass, for long-term data archival has received a lot of attention recently due to the media obsolescence issues faced by contemporary storage media like tape, Hard Disk Drives (HDD), and Solid State Disks (SSD). While researchers have developed novel layout and encoding techniques for archiving databases on these new media ty…
▽ More
Research on alternate media technologies, like film, synthetic DNA, and glass, for long-term data archival has received a lot of attention recently due to the media obsolescence issues faced by contemporary storage media like tape, Hard Disk Drives (HDD), and Solid State Disks (SSD). While researchers have developed novel layout and encoding techniques for archiving databases on these new media types, one key question remains unaddressed: How do we ensure that the decoders developed today will be available and executable by a user who is restoring an archived database several decades later in the future, on a computing platform that potentially does not even exist today?
In this paper, we make the case for Universal Layout Emulation (ULE), a new approach for future-proof, long-term database archival that advocates archiving decoders together with the data to ensure successful recovery. In order to do so, ULE brings together concepts from Data Management and Digital Preservation communities by using emulation for archiving decoders. In order to show that ULE can be implemented in practice, we present the design and evaluation of Micr'Olonys, an end-to-end long-term database archival system that can be used to archive databases using visual analog media like film, microform, and archival paper.
△ Less
Submitted 8 September, 2020; v1 submitted 6 September, 2020;
originally announced September 2020.
-
Cold Storage Data Archives: More Than Just a Bunch of Tapes
Authors:
Bunjamin Memishi,
Raja Appuswamy,
Marcus Paradies
Abstract:
The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains…
▽ More
The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention.
In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
A biologically constrained encoding solution for long-term storage of images onto synthetic DNA
Authors:
Melpomeni Dimopoulou,
Marc Antonini,
Pascal Barbry,
Raja Appuswamy
Abstract:
Living in the age of the digital media explosion, the amount of data that is being stored increases dramatically. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash, tape or even optical storage have limited lifespan in the range of 5 to 20 years. Interestingly, recent studies have proven that it was possible to use synthetic…
▽ More
Living in the age of the digital media explosion, the amount of data that is being stored increases dramatically. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash, tape or even optical storage have limited lifespan in the range of 5 to 20 years. Interestingly, recent studies have proven that it was possible to use synthetic DNA for the storage of digital data, introducing a strong candidate to achieve data longevity. The DNA's biological properties allows the storage of a great amount of information into an extraordinary small volume while also promising efficient storage for centuries or even longer with no loss of information. However, encoding digital data onto DNA is not obvious, because when decoding, we have to face the problem of sequencing noise robustness. Furthermore, synthesizing DNA is an expensive process and thus, controlling the compression ratio by optimizing the rate-distortion trade-off is an important challenge we have to deal with. This work proposes a coding solution for the storage of digital images onto synthetic DNA. We developed a new encoding algorithm which generates a DNA code robust to biological errors coming from the synthesis and the sequencing processes. Furthermore, thanks to an optimized allocation process the solution is able to control the compression ratio and thus the length of the synthesized DNA strand. Results show an improvement in terms of coding potential compared to previous state-of-the-art works.
△ Less
Submitted 7 March, 2019;
originally announced April 2019.
-
Learned Step Size Quantization
Authors:
Steven K. Esser,
Jeffrey L. McKinstry,
Deepika Bablani,
Rathinakumar Appuswamy,
Dharmendra S. Modha
Abstract:
Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a…
▽ More
Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases. Here, we present a method for training such networks, Learned Step Size Quantization, that achieves the highest accuracy to date on the ImageNet dataset when using models, from a variety of architectures, with weights and activations quantized to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach full precision baseline accuracy. Our approach builds upon existing methods for learning weights in quantized networks by improving how the quantizer itself is configured. Specifically, we introduce a novel means to estimate and scale the task loss gradient at each weight and activation layer's quantizer step size, such that it can be learned in conjunction with other network parameters. This approach works using different levels of precision as needed for a given system and requires only a simple modification of existing training code.
△ Less
Submitted 6 May, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference
Authors:
Jeffrey L. McKinstry,
Steven K. Esser,
Rathinakumar Appuswamy,
Deepika Bablani,
John V. Arthur,
Izzet B. Yildiz,
Dharmendra S. Modha
Abstract:
To realize the promise of ubiquitous embedded deep network inference, it is essential to seek limits of energy and area efficiency. To this end, low-precision networks offer tremendous promise because both energy and area scale down quadratically with the reduction in precision. Here we demonstrate ResNet-18, -34, -50, -152, Inception-v3, Densenet-161, and VGG-16bn networks on the ImageNet classif…
▽ More
To realize the promise of ubiquitous embedded deep network inference, it is essential to seek limits of energy and area efficiency. To this end, low-precision networks offer tremendous promise because both energy and area scale down quadratically with the reduction in precision. Here we demonstrate ResNet-18, -34, -50, -152, Inception-v3, Densenet-161, and VGG-16bn networks on the ImageNet classification benchmark that, at 8-bit precision exceed the accuracy of the full-precision baseline networks after one epoch of finetuning, thereby leveraging the availability of pretrained models. We also demonstrate ResNet-18, -34, -50, -152, Densenet-161, and VGG-16bn 4-bit models that match the accuracy of the full-precision baseline networks -- the highest scores to date. Surprisingly, the weights of the low-precision networks are very close (in cosine similarity) to the weights of the corresponding baseline networks, making training from scratch unnecessary.
We find that gradient noise due to quantization during training increases with reduced precision, and seek ways to overcome this noise. The number of iterations required by SGD to achieve a given training error is related to the square of (a) the distance of the initial solution from the final plus (b) the maximum variance of the gradient estimates. Therefore, we (a) reduce solution distance by starting with pretrained fp32 precision baseline networks and fine-tuning, and (b) combat gradient noise introduced by quantization by training longer and reducing learning rates. Sensitivity analysis indicates that these simple techniques, coupled with proper activation function range calibration to take full advantage of the limited precision, are sufficient to discover low-precision networks, if they exist, close to fp32 precision baseline networks. The results herein provide evidence that 4-bits suffice for classification.
△ Less
Submitted 24 February, 2019; v1 submitted 11 September, 2018;
originally announced September 2018.
-
Structured Convolution Matrices for Energy-efficient Deep learning
Authors:
Rathinakumar Appuswamy,
Tapan Nayak,
John Arthur,
Steven Esser,
Paul Merolla,
Jeffrey Mckinstry,
Timothy Melano,
Myron Flickner,
Dharmendra Modha
Abstract:
We derive a relationship between network representation in energy-efficient neuromorphic architectures and block Toplitz convolutional matrices. Inspired by this connection, we develop deep convolutional networks using a family of structured convolutional matrices and achieve state-of-the-art trade-off between energy efficiency and classification accuracy for well-known image recognition tasks. We…
▽ More
We derive a relationship between network representation in energy-efficient neuromorphic architectures and block Toplitz convolutional matrices. Inspired by this connection, we develop deep convolutional networks using a family of structured convolutional matrices and achieve state-of-the-art trade-off between energy efficiency and classification accuracy for well-known image recognition tasks. We also put forward a novel method to train binary convolutional networks by utilising an existing connection between noisy-rectified linear units and binary activations.
△ Less
Submitted 8 June, 2016;
originally announced June 2016.
-
Deep neural networks are robust to weight binarization and other non-linear distortions
Authors:
Paul Merolla,
Rathinakumar Appuswamy,
John Arthur,
Steve K. Esser,
Dharmendra Modha
Abstract:
Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is just the tip of the iceberg: these same networks, during testing, also exhibit a remarkable robustness to distortions beyond quantization, including additive and multiplicative noise, and a class of non-li…
▽ More
Recent results show that deep neural networks achieve excellent performance even when, during training, weights are quantized and projected to a binary representation. Here, we show that this is just the tip of the iceberg: these same networks, during testing, also exhibit a remarkable robustness to distortions beyond quantization, including additive and multiplicative noise, and a class of non-linear projections where binarization is just a special case. To quantify this robustness, we show that one such network achieves 11% test error on CIFAR-10 even with 0.68 effective bits per weight. Furthermore, we find that a common training heuristic--namely, projecting quantized weights during backpropagation--can be altered (or even removed) and networks still achieve a base level of robustness during testing. Specifically, training with weight projections other than quantization also works, as does simply clipping the weights, both of which have never been reported before. We confirm our results for CIFAR-10 and ImageNet datasets. Finally, drawing from these ideas, we propose a stochastic projection rule that leads to a new state of the art network with 7.64% test error on CIFAR-10 using no data augmentation.
△ Less
Submitted 6 June, 2016;
originally announced June 2016.
-
Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing
Authors:
Steven K. Esser,
Paul A. Merolla,
John V. Arthur,
Andrew S. Cassidy,
Rathinakumar Appuswamy,
Alexander Andreopoulos,
David J. Berg,
Jeffrey L. McKinstry,
Timothy Melano,
Davis R. Barch,
Carmelo di Nolfo,
Pallab Datta,
Arnon Amir,
Brian Taba,
Myron D. Flickner,
Dharmendra S. Modha
Abstract:
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural pri…
▽ More
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that i) approach state-of-the-art classification accuracy across 8 standard datasets, encompassing vision and speech, ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1200 and 2600 frames per second and using between 25 and 275 mW (effectively > 6000 frames / sec / W) and iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. For the first time, the algorithmic power of deep learning can be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
△ Less
Submitted 24 May, 2016; v1 submitted 27 March, 2016;
originally announced March 2016.
-
Computing linear functions by linear coding over networks
Authors:
Rathinakumar Appuswamy,
Massimo Franceschetti
Abstract:
We consider the scenario in which a set of sources generate messages in a network and a receiver node demands an arbitrary linear function of these messages. We formulate an algebraic test to determine whether an arbitrary network can compute linear functions using linear codes. We identify a class of linear functions that can be computed using linear codes in every network that satisfies a natura…
▽ More
We consider the scenario in which a set of sources generate messages in a network and a receiver node demands an arbitrary linear function of these messages. We formulate an algebraic test to determine whether an arbitrary network can compute linear functions using linear codes. We identify a class of linear functions that can be computed using linear codes in every network that satisfies a natural cut-based condition. Conversely, for another class of linear functions, we show that the cut-based condition does not guarantee the existence of a linear coding solution. For linear functions over the binary field, the two classes are complements of each other.
△ Less
Submitted 23 February, 2011;
originally announced February 2011.
-
Linear Codes, Target Function Classes, and Network Computing Capacity
Authors:
Rathinakumar Appuswamy,
Massimo Franceschetti,
Nikhil Karamchandani,
Kenneth Zeger
Abstract:
We study the use of linear codes for network computing in single-receiver networks with various classes of target functions of the source messages. Such classes include reducible, injective, semi-injective, and linear target functions over finite fields. Computing capacity bounds and achievability are given with respect to these target function classes for network codes that use routing, linear co…
▽ More
We study the use of linear codes for network computing in single-receiver networks with various classes of target functions of the source messages. Such classes include reducible, injective, semi-injective, and linear target functions over finite fields. Computing capacity bounds and achievability are given with respect to these target function classes for network codes that use routing, linear coding, or nonlinear coding.
△ Less
Submitted 7 May, 2011; v1 submitted 30 December, 2010;
originally announced January 2011.
-
Network Coding for Computing: Cut-Set Bounds
Authors:
Rathinakumar Appuswamy,
Massimo Franceschetti,
Nikhil Karamchandani,
Ken Zeger
Abstract:
The following \textit{network computing} problem is considered. Source nodes in a directed acyclic network generate independent messages and a single receiver node computes a target function $f$ of the messages. The objective is to maximize the average number of times $f$ can be computed per network usage, i.e., the ``computing capacity''. The \textit{network coding} problem for a single-receiver…
▽ More
The following \textit{network computing} problem is considered. Source nodes in a directed acyclic network generate independent messages and a single receiver node computes a target function $f$ of the messages. The objective is to maximize the average number of times $f$ can be computed per network usage, i.e., the ``computing capacity''. The \textit{network coding} problem for a single-receiver network is a special case of the network computing problem in which all of the source messages must be reproduced at the receiver. For network coding with a single receiver, routing is known to achieve the capacity by achieving the network \textit{min-cut} upper bound. We extend the definition of min-cut to the network computing problem and show that the min-cut is still an upper bound on the maximum achievable rate and is tight for computing (using coding) any target function in multi-edge tree networks and for computing linear target functions in any network. We also study the bound's tightness for different classes of target functions. In particular, we give a lower bound on the computing capacity in terms of the Steiner tree packing number and a different bound for symmetric functions. We also show that for certain networks and target functions, the computing capacity can be less than an arbitrarily small fraction of the min-cut bound.
△ Less
Submitted 11 August, 2010; v1 submitted 15 December, 2009;
originally announced December 2009.