Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Chen, Xiaoming; Chen, Jianxu; Chen, Danny Z.; Hu, Xiaobo Sharon

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1705.10591 (cs)

[Submitted on 29 May 2017]

Title:Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Authors:Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu

View PDF

Abstract:Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating convolution operations. However, maximally exploiting the available memory bandwidth of GPUs for convolution is a challenging task. This paper introduces a general model to address the mismatch between the memory bank width of GPUs and computation data width of threads. Based on this model, we develop two convolution kernels, one for the general case and the other for a special case with one input channel. By carefully optimizing memory access patterns and computation patterns, we design a communication-optimized kernel for the special case and a communication-reduced kernel for the general case. Experimental data based on implementations on Kepler GPUs show that our kernels achieve 5.16X and 35.5% average performance improvement over the latest cuDNN library, for the special case and the general case, respectively.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:1705.10591 [cs.DC]
	(or arXiv:1705.10591v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1705.10591

Submission history

From: X. Sharon Hu [view email]
[v1] Mon, 29 May 2017 14:52:42 UTC (2,617 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2017-05

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xiaoming Chen
Jianxu Chen
Danny Z. Chen
Xiaobo Sharon Hu

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators