-
The Case for ABI Interoperability in a Fault Tolerant MPI
Authors:
Yao Xu,
Grace Nansamba,
Anthony Skjellum,
Gene Cooperman
Abstract:
There is new momentum behind an interoperable ABI for MPI, which will be a major component of MPI-5. This capability brings true separation of concerns to a running MPI computation. The linking and compilation of an MPI application becomes completely independent of the choice of MPI library. The MPI application is compiled once, and runs everywhere.
This ABI allows users to independently choose:…
▽ More
There is new momentum behind an interoperable ABI for MPI, which will be a major component of MPI-5. This capability brings true separation of concerns to a running MPI computation. The linking and compilation of an MPI application becomes completely independent of the choice of MPI library. The MPI application is compiled once, and runs everywhere.
This ABI allows users to independently choose: the compiler for the MPI application; the MPI runtime library; and, with this work, the transparent checkpointing package. Arbitrary combinations of the above are supported. The result is a "three-legged stool", which supports performance, portability, and resilience for long-running computations.
An experimental proof-of-concept is presented, using the MANA checkpointing package and the Mukautuva ABI library for MPI interoperability. The result demonstrates that the combination of an ABI-compliant MPI and transparent checkpointing can bring extra flexibility in portability and dynamic resource management at runtime without compromising performance. For example, an MPI application can execute and checkpoint under one MPI library, and later restart under another MPI library. The work is not specific to the MANA package, since the approach using Mukautuva can be adapted to other transparent checkpointing packages.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
ACiS: Complex Processing in the Switch Fabric
Authors:
Pouya Haghi,
Anqi Guo,
Tong Geng,
Anthony Skjellum,
Martin Herbordt
Abstract:
For the last three decades a core use of FPGAs has been for processing communication: FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting switches with FPGAs, however, has been less studied, but has numerous advantages built around the processing being moved from the edge of the network to the center. Communication switches have previously been augmented to process co…
▽ More
For the last three decades a core use of FPGAs has been for processing communication: FPGA-based SmartNICs are in widespread use from the datacenter to IoT. Augmenting switches with FPGAs, however, has been less studied, but has numerous advantages built around the processing being moved from the edge of the network to the center. Communication switches have previously been augmented to process collectives, e.g., IBM BlueGene and Mellanox SHArP, but the support has been limited to a small set of predefined scalar operations and datatypes. Here we present ACiS, a framework and taxonomy for Advanced Computing in the Switch that unifies and expands our previous work in this area. In addition to fixed scalar collectives (Type 1), we propose three more types of in-switch application processing: (Type 2) User-defined operations and types, including data structures; (Type 3) Look-aside operations that have state within the operation and can have loops; and (Type 4) Fused collectives built by fusing multiple existing collectives or collectives with map computations. ACiS is supported in hardware with modular switch extensions including a CGRA architecture. Software support for ACiS includes evaluation and translation of relevant parts of user programs, compilation of user specifications into control flow graphs, and mapping the graphs into switch hardware. The overall goal is the transparent acceleration of HPC applications encapsulated within an MPI implementation.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Understanding GPU Triggering APIs for MPI+X Communication
Authors:
Patrick G. Bridges,
Anthony Skjellum,
Evan D. Suggs,
Derek Schafer,
Purushotham V. Bangalore
Abstract:
GPU-enhanced architectures are now dominant in HPC systems, but message-passing communication involving GPUs with MPI has proven to be both complex and expensive, motivating new approaches that lower such costs. We compare and contrast stream/graph- and kernel-triggered MPI communication abstractions, whose principal purpose is to enhance the performance of communication when GPU kernels create or…
▽ More
GPU-enhanced architectures are now dominant in HPC systems, but message-passing communication involving GPUs with MPI has proven to be both complex and expensive, motivating new approaches that lower such costs. We compare and contrast stream/graph- and kernel-triggered MPI communication abstractions, whose principal purpose is to enhance the performance of communication when GPU kernels create or consume data for transfer through MPI operations. Researchers and practitioners have proposed multiple potential APIs for stream and/or kernel triggering that span various GPU architectures and approaches, including MPI-4 partitioned point-to-point communication, stream communicators, and explicit MPI stream/queue objects. Designs breaking backward compatibility with MPI are duly noted. Some of these strengthen or weaken the semantics of MPI operations. A key contribution of this paper is to promote community convergence toward a stream- and/or kernel-triggering abstraction by highlighting the common and differing goals and contributions of existing abstractions. We describe the design space in which these abstractions reside, their implicit or explicit use of stream and other non-MPI abstractions, their relationship to partitioned and persistent operations, and discuss their potential for added performance, how usable these abstractions are, and where functional and/or semantic gaps exist. Finally, we provide a taxonomy for stream- and kernel-triggered abstractions, including disambiguation of similar semantic terms, and consider directions for future standardization in MPI-5.
△ Less
Submitted 31 July, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
MPI Implementation Profiling for Better Application Performance
Authors:
Riley Shipley,
Garrett Hooten,
David Boehme,
Derek Schafer,
Anthony Skjellum,
Olga Pearce
Abstract:
While application profiling has been a mainstay in the HPC community for years, profiling of MPI and other communication middleware has not received the same degree of exploration. This paper adds to the discussion of MPI profiling, contributing two general-purpose profiling methods as well as practical applications of these methods to an existing implementation. The ability to detect performance…
▽ More
While application profiling has been a mainstay in the HPC community for years, profiling of MPI and other communication middleware has not received the same degree of exploration. This paper adds to the discussion of MPI profiling, contributing two general-purpose profiling methods as well as practical applications of these methods to an existing implementation. The ability to detect performance defects in MPI codes using these methods increases the potential of further research and development in communication optimization.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Implementation-Oblivious Transparent Checkpoint-Restart for MPI
Authors:
Yao Xu,
Leonid Belyaev,
Twinkle Jain,
Derek Schafer,
Anthony Skjellum,
Gene Cooperman
Abstract:
This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each…
▽ More
This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
MPI Advance : Open-Source Message Passing Optimizations
Authors:
Amanda Bienz,
Derek Schafer,
Anthony Skjellum
Abstract:
The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI inst…
▽ More
The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI installations and therefore rely on whichever implementation is provided as a system install. This paper presents MPI Advance, a collection of libraries that sit on top of MPI, optimizing the underlying performance of any existing MPI library. The libraries provide optimizations for collectives, neighborhood collectives, partitioned communication, and GPU-aware communication.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
The Impact of Space-Filling Curves on Data Movement in Parallel Systems
Authors:
David Walker,
Anthony Skjellum
Abstract:
Modern computer systems are characterized by deep memory hierarchies, composed of main memory, multiple layers of cache, and other specialized types of memory. In parallel and distributed systems, additional memory layers are added to this hierarchy. Achieving good performance for computational science applications, in terms of execution time, depends on the efficient use of this diverse and hiera…
▽ More
Modern computer systems are characterized by deep memory hierarchies, composed of main memory, multiple layers of cache, and other specialized types of memory. In parallel and distributed systems, additional memory layers are added to this hierarchy. Achieving good performance for computational science applications, in terms of execution time, depends on the efficient use of this diverse and hierarchical memory. This paper revisits the use of space-filling curves to specify the ordering in memory of data structures used in representative scientific applications executing on parallel machines containing clusters of multicore CPUs with attached GPUs. This work examines the hypothesis that space-filling curves, such as Hilbert and Morton ordering, can improve data locality and hence result in more efficient data movement than row or column-based orderings. First, performance results are presented that show for what application parameterizations and machine characteristics this is the case, and are interpreted in terms of how an application interacts with the computer hardware and low-level software. This research particularly focuses on the use of stencil-based applications that form the basis of many scientific computations. Second, how space-filling curves impact data sharing in nearest-neighbour and stencil-based codes is considered.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Collective-Optimized FFTs
Authors:
Evelyn Namugwanya,
Amanda Bienz,
Derek Schafer,
Anthony Skjellum
Abstract:
This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs.
This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs.
△ Less
Submitted 4 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications
Authors:
Pouya Haghi,
Ryan Marshall,
Po Hao Chen,
Anthony Skjellum,
Martin Herbordt
Abstract:
Offload of MPI collectives to network devices, e.g., NICs and switches, is being implemented as an effective mechanism to improve application performance by reducing inter- and intra-node communication and bypassing MPI software layers. Given the rich deployment of accelerators and programmable NICs/switches in data centers, we posit that there is an opportunity to further improve performance by e…
▽ More
Offload of MPI collectives to network devices, e.g., NICs and switches, is being implemented as an effective mechanism to improve application performance by reducing inter- and intra-node communication and bypassing MPI software layers. Given the rich deployment of accelerators and programmable NICs/switches in data centers, we posit that there is an opportunity to further improve performance by extending this idea (of in-network collective processing) to a new class of more complex collectives. The most basic type of complex collective is the fusion of existing collectives.
In previous work we have demonstrated the efficacy of this additional hardware and software support and shown that it can substantially improve the performance of certain applications. In this work we extend this approach. We seek to characterize a large number of MPI applications to determine overall applicability, both breadth and type, and so provide insight for hardware designers and MPI developers about future offload possibilities.
Besides increasing the scope of prior surveys to include finding (potential) new MPI constructs, we also tap into new methods to extend the survey process. Prior surveys on MPI usage considered lists of applications constructed based on application developers' knowledge. The approach taken in this paper, however, is based on an automated mining of a large collection of code sources. More specifically, the mining is accomplished by GitHub REST APIs. We use a database management system to store the results and to answer queries. Another advantage is that this approach provides support for a more complex analysis of MPI usage, which is accomplished by user queries.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Checkpoint-Restart Libraries Must Become More Fault Tolerant
Authors:
Anthony Skjellum,
Derek Schafer
Abstract:
Production MPI codes need checkpoint-restart (CPR) support. Clearly, checkpoint-restart libraries must be fault tolerant lest they open up a window of vulnerability for failures with byzantine outcomes. But, certain popular libraries that leverage MPI are evidently not fault tolerant. Nowadays, fault detection with automatic recovery without batch requeueing is a strong requirement for production…
▽ More
Production MPI codes need checkpoint-restart (CPR) support. Clearly, checkpoint-restart libraries must be fault tolerant lest they open up a window of vulnerability for failures with byzantine outcomes. But, certain popular libraries that leverage MPI are evidently not fault tolerant. Nowadays, fault detection with automatic recovery without batch requeueing is a strong requirement for production environments. Thus, allowing deadlock and setting long timeouts are suboptimal for fault detection even when paired with conservative recovery from the penultimate checkpoint.
When MPI is used as a communication mechanism within a CPR library, such libraries must offer fault-tolerant extensions with minimal detection, isolation, mitigation, and potential recovery semantics to aid the CPR's library fail-backward. Communication between MPI and the checkpoint library regarding system health may be valuable. For fault-tolerant MPI programs (e.g., using APIs like FA-MPI, Stages/Reinit, or ULFM), the checkpoint library must cooperate with the extended model or else invalidate fault-tolerant operation.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Scrybe: A Secure Audit Trail for Clinical Trial Data Fusion
Authors:
Jon Oakley,
Carl Worley,
Lu Yu,
Richard Brooks,
Ilker Ozcelik,
Anthony Skjellum,
Jihad Obeid
Abstract:
Clinical trials are a multi-billion dollar industry. One of the biggest challenges facing the clinical trial research community is satisfying Part 11 of Title 21 of the Code of Federal Regulations and ISO 27789. These controls provide audit requirements that guarantee the reliability of the data contained in the electronic records. Context-aware smart devices and wearable IoT devices have become i…
▽ More
Clinical trials are a multi-billion dollar industry. One of the biggest challenges facing the clinical trial research community is satisfying Part 11 of Title 21 of the Code of Federal Regulations and ISO 27789. These controls provide audit requirements that guarantee the reliability of the data contained in the electronic records. Context-aware smart devices and wearable IoT devices have become increasingly common in clinical trials. Electronic Data Capture (EDC) and Clinical Data Management Systems (CDMS) do not currently address the new challenges introduced using these devices. The healthcare digital threat landscape is continually evolving, and the prevalence of sensor fusion and wearable devices compounds the growing attack surface. We propose Scrybe, a permissioned blockchain, to store proof of clinical trial data provenance. We illustrate how Scrybe addresses each control and the limitations of the Ethereum-based blockchains. Finally, we provide a proof-of-concept integration with REDCap to show tamper resistance.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
MPIs Language Bindings are Holding MPI Back
Authors:
Martin Ruefenacht,
Derek Schafer,
Anthony Skjellum,
Purushotham V. Bangalore
Abstract:
Over the past two decades, C++ has been adopted as a major HPC language (displacing C to a large extent, andFortran to some degree as well). Idiomatic C++ is clearly how C++ is being used nowadays. But, MPIs syntax and semantics defined and extended with C and Fortran interfaces that align with the capabilities and limitations of C89 and Fortran-77.Unfortunately, the language-independent specifica…
▽ More
Over the past two decades, C++ has been adopted as a major HPC language (displacing C to a large extent, andFortran to some degree as well). Idiomatic C++ is clearly how C++ is being used nowadays. But, MPIs syntax and semantics defined and extended with C and Fortran interfaces that align with the capabilities and limitations of C89 and Fortran-77.Unfortunately, the language-independent specification also clearly reflects the intersection of what these languages could syntactically and semantically manage at the outset in 1993, rather than being truly language neutral.In this paper, we propose a modern C++ language interface to replace the C language binding for C++ programmers with an upward-compatible architecture that leverages all the benefits of C++11-20 for performance, productivity, and interoperability with other popular C++ libraries and interfaces for HPC. Demand is demonstrably strong for this second attempt at language support for C++ in MPI after the original interface, which was added in MPI-2, then was found to lack specific benefits over theC binding, and so was subsequently removed in MPI-3. Since C++ and its idiomatic usage have evolved since the original C++ language binding was removed from the standard, this new effort is both timely and important for MPI applications. Also, many C++ application programmers create their own, ad hoc shim libraries over MPI to provide some degree of abstraction unique to their particular project, which means many such abstraction libraries are being devised without any specific commonality other than the demand for such.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
An Overview of Cryptographic Accumulators
Authors:
Ilker Ozcelik,
Sai Medury,
Justin Broaddus,
Anthony Skjellum
Abstract:
This paper is a primer on cryptographic accumulators and how to apply them practically. A cryptographic accumulator is a space- and time-efficient data structure used for set-membership tests. Since it is possible to represent any computational problem where the answer is yes or no as a set-membership problem, cryptographic accumulators are invaluable data structures in computer science and engine…
▽ More
This paper is a primer on cryptographic accumulators and how to apply them practically. A cryptographic accumulator is a space- and time-efficient data structure used for set-membership tests. Since it is possible to represent any computational problem where the answer is yes or no as a set-membership problem, cryptographic accumulators are invaluable data structures in computer science and engineering. But, to the best of our knowledge, there is neither a concise survey comparing and contrasting various types of accumulators nor a guide for how to apply the most appropriate one for a given application. Therefore, we address that gap by describing cryptographic accumulators while presenting their fundamental and so-called optional properties. We discuss the effects of each property on the given accumulator's performance in terms of space and time complexity, as well as communication overhead.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Pre-print: Radio Identity Verification-based IoT Security Using RF-DNA Fingerprints and SVM
Authors:
Donald Reising,
Joseph Cancelleri,
T. Daniel Loveless,
Farah Kandah,
Anthony Skjellum
Abstract:
It is estimated that the number of IoT devices will reach 75 billion in the next five years. Most of those currently, and to be deployed, lack sufficient security to protect themselves and their networks from attack by malicious IoT devices that masquerade as authorized devices to circumvent digital authentication approaches. This work presents a PHY layer IoT authentication approach capable of ad…
▽ More
It is estimated that the number of IoT devices will reach 75 billion in the next five years. Most of those currently, and to be deployed, lack sufficient security to protect themselves and their networks from attack by malicious IoT devices that masquerade as authorized devices to circumvent digital authentication approaches. This work presents a PHY layer IoT authentication approach capable of addressing this critical security need through the use of feature reduced Radio Frequency-Distinct Native Attributes (RF-DNA) fingerprints and Support Vector Machines (SVM). This work successfully demonstrates 100%: (i) authorized ID verification across three trials of six randomly chosen radios at signal-to-noise ratios greater than or equal to 6 dB, and (ii) rejection of all rogue radio ID spoofing attacks at signal-to-noise ratios greater than or equal to 3 dB using RF-DNA fingerprints whose features are selected using the Relief-F algorithm.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Extending the Message Passing Interface (MPI) with User-Level Schedules
Authors:
Derek Schafer,
Sheikh Ghafoor,
Daniel Holmes,
Martin Ruefenacht,
Anthony Skjellum
Abstract:
Composability is one of seven reasons for the long-standing and continuing success of MPI. Extending MPI by composing its operations with user-level operations provides useful integration with the progress engine and completion notification methods of MPI. However, the existing extensibility mechanism in MPI (generalized requests) is not widely utilized and has significant drawbacks.
MPI can be…
▽ More
Composability is one of seven reasons for the long-standing and continuing success of MPI. Extending MPI by composing its operations with user-level operations provides useful integration with the progress engine and completion notification methods of MPI. However, the existing extensibility mechanism in MPI (generalized requests) is not widely utilized and has significant drawbacks.
MPI can be generalized via scheduled communication primitives, for example, by utilizing implementation techniques from existing MPI-3 nonblocking collectives and from forthcoming MPI-4 persistent and partitioned APIs. Non-trivial schedules are used internally in some MPI libraries; but, they are not accessible to end-users.
Message-based communication patterns can be built as libraries on top of MPI. Such libraries can have comparable implementation maturity and potentially higher performance than MPI library code, but do not require intimate knowledge of the MPI implementation. Libraries can provide performance-portable interfaces that cross MPI implementation boundaries. The ability to compose additional user-defined operations using the same progress engine benefits all kinds of general purpose HPC libraries.
We propose a definition for MPI schedules: a user-level programming model suitable for creating persistent collective communication composed with new application-specific sequences of user-defined operations managed by MPI and fully integrated with MPI progress and completion notification. The API proposed offers a path to standardization for extensible communication schedules involving user-defined operations. Our approach has the potential to introduce event-driven programming into MPI (beyond the tools interface), although connecting schedules with events comprises future work.
Early performance results described here are promising and indicate strong overlap potential.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
MPIgnite: An MPI-Like Language and Prototype Implementation for Apache Spark
Authors:
Brandon L. Morris,
Anthony Skjellum
Abstract:
Scale-out parallel processing based on MPI is a 25-year-old standard with at least another decade of preceding history of enabling technologies in the High Performance Computing community. Newer frameworks such as MapReduce, Hadoop, and Spark represent industrial scalable computing solutions that have received broad adoption because of their comparative simplicity of use, applicability to relevant…
▽ More
Scale-out parallel processing based on MPI is a 25-year-old standard with at least another decade of preceding history of enabling technologies in the High Performance Computing community. Newer frameworks such as MapReduce, Hadoop, and Spark represent industrial scalable computing solutions that have received broad adoption because of their comparative simplicity of use, applicability to relevant problems, and ability to harness scalable, distributed resources. While MPI provides performance and portability, it lacks in productivity and fault tolerance. Likewise, Spark is a specific example of a current-generation MapReduce and data-parallel computing infrastructure that addresses those goals but in turn lacks peer communication support to allow featherweight, highly scalable peer-to-peer data-parallel code sections. The key contribution of this paper is to demonstrate how to introduce the collective and point-to-point peer communication concepts of MPI into a Spark environment. This is done in order to produce performance-portable, peer-oriented and group-oriented communication services while retaining the essential, desirable properties of Spark. Additional concepts of fault tolerance and productivity are considered. This approach is offered in contrast to adding MapReduce framework as upper-middleware based on a traditional MPI implementation as baseline infrastructure.
△ Less
Submitted 15 July, 2017;
originally announced July 2017.
-
Provenance Threat Modeling
Authors:
Oluwakemi Hambolu,
Lu Yu,
Jon Oakley,
Richard R. Brooks,
Ujan Mukhopadhyay,
Anthony Skjellum
Abstract:
Provenance systems are used to capture history metadata, applications include ownership attribution and determining the quality of a particular data set. Provenance systems are also used for debugging, process improvement, understanding data proof of ownership, certification of validity, etc. The provenance of data includes information about the processes and source data that leads to the current…
▽ More
Provenance systems are used to capture history metadata, applications include ownership attribution and determining the quality of a particular data set. Provenance systems are also used for debugging, process improvement, understanding data proof of ownership, certification of validity, etc. The provenance of data includes information about the processes and source data that leads to the current representation. In this paper we study the security risks provenance systems might be exposed to and recommend security solutions to better protect the provenance information.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.
-
dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures
Authors:
Steven Eliuk,
Cameron Upright,
Anthony Skjellum
Abstract:
A new scalable parallel math library, dMath, is presented in this paper that demonstrates leading scaling when using intranode, or internode, hybrid-parallelism for deep-learning. dMath provides easy-to-use distributed base primitives and a variety of domain-specific algorithms. These include matrix multiplication, convolutions, and others allowing for rapid development of highly scalable applicat…
▽ More
A new scalable parallel math library, dMath, is presented in this paper that demonstrates leading scaling when using intranode, or internode, hybrid-parallelism for deep-learning. dMath provides easy-to-use distributed base primitives and a variety of domain-specific algorithms. These include matrix multiplication, convolutions, and others allowing for rapid development of highly scalable applications, including Deep Neural Networks (DNN), whereas previously one was restricted to libraries that provided effective primitives for only a single GPU, like Nvidia cublas and cudnn or DNN primitives from Nervana neon framework. Development of HPC software is difficult, labor-intensive work, requiring a unique skill set. dMath allows a wide range of developers to utilize parallel and distributed hardware easily. One contribution of this approach is that data is stored persistently on the GPU hardware, avoiding costly transfers between host and device. Advanced memory management techniques are utilized, including caching of transferred data and memory reuse through pooling. A key contribution of dMath is that it delivers performance, portability, and productivity to its specific domain of support. It enables algorithm and application programmers to quickly solve problems without managing the significant complexity associated with multi-level parallelism.
△ Less
Submitted 5 April, 2016;
originally announced April 2016.
-
Accelerating Lossless Data Compression with GPUs
Authors:
R. L. Cloud,
M. L. Curry,
H. L. Ward,
A. Skjellum,
P. Bangalore
Abstract:
Huffman compression is a statistical, lossless, data compression algorithm that compresses data by assigning variable length codes to symbols, with the more frequently appearing symbols given shorter codes than the less. This work is a modification of the Huffman algorithm which permits uncompressed data to be decomposed into indepen- dently compressible and decompressible blocks, allowing for con…
▽ More
Huffman compression is a statistical, lossless, data compression algorithm that compresses data by assigning variable length codes to symbols, with the more frequently appearing symbols given shorter codes than the less. This work is a modification of the Huffman algorithm which permits uncompressed data to be decomposed into indepen- dently compressible and decompressible blocks, allowing for concurrent compression and decompression on multiple processors. We create implementations of this modified algorithm on a current NVIDIA GPU using the CUDA API as well as on a current Intel chip and the performance results are compared, showing favorable GPU performance for nearly all tests. Lastly, we discuss the necessity for high performance data compression in today's supercomputing ecosystem.
△ Less
Submitted 21 June, 2011;
originally announced July 2011.