FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

Zhao, Weijie; Jiao, Xuewu; Luo, Xinsheng; Li, Jingxue; Karimi, Belhal; Li, Ping

Computer Science > Information Retrieval

arXiv:2210.07768 (cs)

[Submitted on 26 Sep 2022]

Title:FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

Authors:Weijie Zhao, Xuewu Jiao, Xinsheng Luo, Jingxue Li, Belhal Karimi, Ping Li

View PDF

Abstract:Deep learning has been widely deployed for online ads systems to predict Click-Through Rate (CTR). Machine learning researchers and practitioners frequently retrain CTR models to test their new extracted features. However, the CTR model training often relies on a large number of raw input data logs. Hence, the feature extraction can take a significant proportion of the training time for an industrial-level CTR model. In this paper, we propose FeatureBox, a novel end-to-end training framework that pipelines the feature extraction and the training on GPU servers to save the intermediate I/O of the feature extraction. We rewrite computation-intensive feature extraction operators as GPU operators and leave the memory-intensive operator on CPUs. We introduce a layer-wise operator scheduling algorithm to schedule these heterogeneous operators. We present a light-weight GPU memory management algorithm that supports dynamic GPU memory allocation with minimal overhead. We experimentally evaluate FeatureBox and compare it with the previous in-production feature extraction framework on two real-world ads applications. The results confirm the effectiveness of our proposed method.

Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2210.07768 [cs.IR]
	(or arXiv:2210.07768v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2210.07768

Submission history

From: Weijie Zhao [view email]
[v1] Mon, 26 Sep 2022 02:31:13 UTC (249 KB)

Computer Science > Information Retrieval

Title:FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators