CuMF_SGD: Fast and Scalable Matrix Factorization

Xie, Xiaolong; Tan, Wei; Fong, Liana L.; Liang, Yun

Computer Science > Machine Learning

arXiv:1610.05838 (cs)

[Submitted on 19 Oct 2016 (v1), last revised 10 Nov 2016 (this version, v3)]

Title:CuMF_SGD: Fast and Scalable Matrix Factorization

Authors:Xiaolong Xie, Wei Tan, Liana L. Fong, Yun Liang

View PDF

Abstract:Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do incremental learning. We observed that SGD for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small data sets; distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connection. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update that fully exploit the massive amount of cores. Especially, batch-Hogwild! as a vectorized version of Hogwild! overcomes the issue of memory discontinuity. We also develop highly-optimized kernels for SGD update, leveraging cache, warp-shuffle instructions and half-precision floats. We also design a partition scheme to utilize multiple GPUs while addressing the well-known convergence issue when parallelizing SGD. On three data sets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1X-28.2X as fast compared with state-of-art CPU solutions on 1-64 CPU nodes. Evaluations also show that cuMF_SGD scales well on multiple GPUs in large data sets.

Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA)
Cite as:	arXiv:1610.05838 [cs.LG]
	(or arXiv:1610.05838v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1610.05838

Submission history

From: Xiaolong Xie [view email]
[v1] Wed, 19 Oct 2016 01:28:11 UTC (749 KB)
[v2] Thu, 20 Oct 2016 13:38:34 UTC (596 KB)
[v3] Thu, 10 Nov 2016 01:16:40 UTC (703 KB)

Computer Science > Machine Learning

Title:CuMF_SGD: Fast and Scalable Matrix Factorization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CuMF_SGD: Fast and Scalable Matrix Factorization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators