Learning to Embed Categorical Features without Embedding Tables for Recommendation

Kang, Wang-Cheng; Cheng, Derek Zhiyuan; Yao, Tiansheng; Yi, Xinyang; Chen, Ting; Hong, Lichan; Chi, Ed H.

Computer Science > Machine Learning

arXiv:2010.10784 (cs)

[Submitted on 21 Oct 2020 (v1), last revised 7 Jun 2021 (this version, v2)]

Title:Learning to Embed Categorical Features without Embedding Tables for Recommendation

Authors:Wang-Cheng Kang, Derek Zhiyuan Cheng, Tiansheng Yao, Xinyang Yi, Ting Chen, Lichan Hong, Ed H. Chi

View PDF

Abstract:Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.

Comments:	Accepted to KDD'21, Research Track
Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR)
Cite as:	arXiv:2010.10784 [cs.LG]
	(or arXiv:2010.10784v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2010.10784

Submission history

From: Wang-Cheng Kang [view email]
[v1] Wed, 21 Oct 2020 06:37:28 UTC (1,095 KB)
[v2] Mon, 7 Jun 2021 06:31:19 UTC (1,085 KB)

Computer Science > Machine Learning

Title:Learning to Embed Categorical Features without Embedding Tables for Recommendation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning to Embed Categorical Features without Embedding Tables for Recommendation

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators