-
A 3D Parallel Algorithm for QR Decomposition
Authors:
Grey Ballard,
James Demmel,
Laura Grigori,
Mathias Jacquelin,
Nicholas Knight
Abstract:
Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different c…
▽ More
Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.
△ Less
Submitted 14 May, 2018;
originally announced May 2018.
-
Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product
Authors:
Grey Ballard,
Nicholas Knight,
Kathryn Rouse
Abstract:
The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel al…
▽ More
The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel algorithms that attain the lower bounds and are therefore communication optimal. In particular, we show that the structure of the computation allows for less communication than the straightforward approach of casting the computation as a matrix multiplication operation.
△ Less
Submitted 22 October, 2017; v1 submitted 24 August, 2017;
originally announced August 2017.
-
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
Authors:
Grey Ballard,
Alex Druinsky,
Nicholas Knight,
Oded Schwartz
Abstract:
We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in…
▽ More
We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in a sequential computation. We show that identifying a communication-optimal algorithm for particular input matrices is equivalent to solving a hypergraph partitioning problem. Our approach is sparsity dependent, meaning that we seek the best algorithm for the given input matrices.
In addition to our (3D) fine-grained model, we also propose coarse-grained 1D and 2D models that correspond to simpler SpGEMM algorithms. We explore the relations between our models theoretically, and we study their performance experimentally in the context of three applications that use SpGEMM as a key computation. For each application, we find that at least one coarse-grained model is as communication efficient as the fine-grained model. We also observe that different applications have affinities for different algorithms.
Our results demonstrate that hypergraphs are an accurate model for reasoning about the communication costs of SpGEMM as well as a practical tool for exploring the SpGEMM algorithm design space.
△ Less
Submitted 17 March, 2016;
originally announced March 2016.
-
Communication lower bounds and optimal algorithms for programs that reference arrays -- Part 1
Authors:
Michael Christ,
James Demmel,
Nicholas Knight,
Thomas Scanlon,
Katherine Yelick
Abstract:
The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by this, attainable lower bounds for the amount of communication required by algorithms were established by several groups for a variety of algorithms, including mat…
▽ More
The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by this, attainable lower bounds for the amount of communication required by algorithms were established by several groups for a variety of algorithms, including matrix computations. Prior work of Ballard-Demmel-Holtz-Schwartz relied on a geometric inequality of Loomis and Whitney for this purpose. In this paper the general theory of discrete multilinear Holder-Brascamp-Lieb (HBL) inequalities is used to establish communication lower bounds for a much wider class of algorithms. In some cases, algorithms are presented which attain these lower bounds.
Several contributions are made to the theory of HBL inequalities proper. The optimal constant in such an inequality for torsion-free Abelian groups is shown to equal one whenever it is finite. Bennett-Carbery-Christ-Tao had characterized the tuples of exponents for which such an inequality is valid as the convex polyhedron defined by a certain finite list of inequalities. The problem of constructing an algorithm to decide whether a given inequality is on this list, is shown to be equivalent to Hilbert's Tenth Problem over the rationals, which remains open. Nonetheless, an algorithm which computes the polyhedron itself is constructed.
△ Less
Submitted 31 July, 2013;
originally announced August 2013.