AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset

Everaert, Dante; Patki, Rohit; Zheng, Tianqi; Potts, Christopher

Computer Science > Information Retrieval

arXiv:2411.04129 (cs)

[Submitted on 22 Oct 2024]

Title:AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset

Authors:Dante Everaert, Rohit Patki, Tianqi Zheng, Christopher Potts

View PDF HTML (experimental)

Abstract:Query Autocomplete (QAC) is a critical feature in modern search engines, facilitating user interaction by predicting search queries based on input prefixes. Despite its widespread adoption, the absence of large-scale, realistic datasets has hindered advancements in QAC system development. This paper addresses this gap by introducing AmazonQAC, a new QAC dataset sourced from Amazon Search logs, comprising 395M samples. The dataset includes actual sequences of user-typed prefixes leading to final search terms, as well as session IDs and timestamps that support modeling the context-dependent aspects of QAC. We assess Prefix Trees, semantic retrieval, and Large Language Models (LLMs) with and without finetuning. We find that finetuned LLMs perform best, particularly when incorporating contextual information. However, even our best system achieves only half of what we calculate is theoretically possible on our test data, which implies QAC is a challenging problem that is far from solved with existing systems. This contribution aims to stimulate further research on QAC systems to better serve user needs in diverse environments. We open-source this data on Hugging Face at this https URL.

Comments:	EMNLP 2024
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2411.04129 [cs.IR]
	(or arXiv:2411.04129v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2411.04129

Submission history

From: Dante Everaert [view email]
[v1] Tue, 22 Oct 2024 21:11:34 UTC (31 KB)

Computer Science > Information Retrieval

Title:AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators