Design Principles for Sparse Matrix Multiplication on the GPU

Yang, Carl; Buluc, Aydin; Owens, John D.

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1803.08601 (cs)

[Submitted on 22 Mar 2018 (v1), last revised 12 Jun 2018 (this version, v2)]

Title:Design Principles for Sparse Matrix Multiplication on the GPU

Authors:Carl Yang, Aydin Buluc, John D. Owens

View PDF

Abstract:We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion. While previous SpMM work concentrates on thread-level parallelism, we additionally focus on latency hiding with instruction-level parallelism and load-balancing. We show, both theoretically and experimentally, that the proposed SpMM is a better fit for the GPU than previous approaches. We identify a key memory access pattern that allows efficient access into both input and output matrices that is crucial to getting excellent performance on SpMM. By combining these two ingredients---(i) merge-based load-balancing and (ii) row-major coalesced memory access---we demonstrate a 4.1x peak speedup and a 31.7% geomean speedup over state-of-the-art SpMM implementations on real-world datasets.

Comments:	16 pages, 7 figures, International European Conference on Parallel and Distributed Computing (Euro-Par) 2018
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:1803.08601 [cs.DC]
	(or arXiv:1803.08601v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1803.08601

Submission history

From: Carl Yang [view email]
[v1] Thu, 22 Mar 2018 22:31:17 UTC (493 KB)
[v2] Tue, 12 Jun 2018 06:30:45 UTC (482 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2018-03

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Carl Yang
Aydin Buluç
John D. Owens

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Design Principles for Sparse Matrix Multiplication on the GPU

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Design Principles for Sparse Matrix Multiplication on the GPU

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators