Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Wang, Joey; Wei, Yingcan; Lee, Minseok; Langer, Matthias; Yu, Fan; Liu, Jie; Liu, Alex; Abel, Daniel; Guo, Gems; Dong, Jianbing; Shi, Jerry; Li, Kunlun

doi:10.1145/3523227.3547405

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2210.08803 (cs)

[Submitted on 17 Oct 2022]

Title:Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Authors:Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li

View PDF

Abstract:In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5~62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency.

Comments:	4 pages
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2210.08803 [cs.DC]
	(or arXiv:2210.08803v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2210.08803
Journal reference:	Proceedings of the 16th ACM Conference on Recommender Systems, 2022
Related DOI:	https://doi.org/10.1145/3523227.3547405

Submission history

From: Matthias Langer PhD [view email]
[v1] Mon, 17 Oct 2022 07:35:46 UTC (77 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators