How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Lin, Sheng-Chieh; Asai, Akari; Li, Minghan; Oguz, Barlas; Lin, Jimmy; Mehdad, Yashar; Yih, Wen-tau; Chen, Xilun

Computer Science > Information Retrieval

arXiv:2302.07452 (cs)

[Submitted on 15 Feb 2023]

Title:How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Authors:Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

View PDF

Abstract:Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA). Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2302.07452 [cs.IR]
	(or arXiv:2302.07452v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2302.07452

Submission history

From: Sheng-Chieh Lin [view email]
[v1] Wed, 15 Feb 2023 03:53:26 UTC (514 KB)

Computer Science > Information Retrieval

Title:How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators