Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Yuan, Guanghu; Yuan, Fajie; Li, Yudong; Kong, Beibei; Li, Shujie; Chen, Lei; Yang, Min; Yu, Chenyun; Hu, Bo; Li, Zang; Xu, Yu; Qie, Xiaohu

Computer Science > Information Retrieval

arXiv:2210.10629v3 (cs)

[Submitted on 13 Oct 2022 (v1), last revised 4 Jun 2023 (this version, v3)]

Title:Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Authors:Guanghu Yuan, Fajie Yuan, Yudong Li, Beibei Kong, Shujie Li, Lei Chen, Min Yang, Chenyun Yu, Bo Hu, Zang Li, Yu Xu, Xiaohu Qie

View PDF

Abstract:Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks.

Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2210.10629 [cs.IR]
	(or arXiv:2210.10629v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2210.10629

Submission history

From: Guanghu Yuan [view email]
[v1] Thu, 13 Oct 2022 15:57:40 UTC (197 KB)
[v2] Thu, 20 Oct 2022 12:19:36 UTC (409 KB)
[v3] Sun, 4 Jun 2023 04:00:05 UTC (196 KB)

Computer Science > Information Retrieval

Title:Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators