COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

Yu, Yue; Xiong, Chenyan; Sun, Si; Zhang, Chao; Overwijk, Arnold

Computer Science > Computation and Language

arXiv:2210.15212 (cs)

[Submitted on 27 Oct 2022 (v1), last revised 24 Nov 2022 (this version, v2)]

Title:COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

Authors:Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, Arnold Overwijk

View PDF

Abstract:We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios. To mitigate the impact of document differences, COCO-DR continues pretraining the language model on the target corpora to adapt the model to target distributions via COtinuous COtrastive learning. To prepare for unseen target queries, COCO-DR leverages implicit Distributionally Robust Optimization (iDRO) to reweight samples from different source query clusters for improving model robustness over rare queries during fine-tuning. COCO-DR achieves superior average performance on BEIR, the zero-shot retrieval benchmark. At BERT Base scale, COCO-DR Base outperforms other ZeroDR models with 60x larger size. At BERT Large scale, COCO-DR Large outperforms the giant GPT-3 embedding model which has 500x more parameters. Our analysis show the correlation between COCO-DR's effectiveness in combating distribution shifts and improving zero-shot accuracy. Our code and model can be found at \url{this https URL}.

Comments:	EMNLP 2022 (Main Conference). The code and Model can be found at this https URL
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2210.15212 [cs.CL]
	(or arXiv:2210.15212v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.15212
Journal reference:	EMNLP 2022

Submission history

From: Yue Yu [view email]
[v1] Thu, 27 Oct 2022 06:51:39 UTC (522 KB)
[v2] Thu, 24 Nov 2022 05:11:54 UTC (248 KB)

Computer Science > Computation and Language

Title:COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators