NV-Retriever: Improving text embedding models with effective hard-negative mining

Moreira, Gabriel de Souza P.; Osmulski, Radek; Xu, Mengyao; Ak, Ronay; Schifferer, Benedikt; Oldridge, Even

Computer Science > Information Retrieval

arXiv:2407.15831 (cs)

[Submitted on 22 Jul 2024 (v1), last revised 7 Feb 2025 (this version, v2)]

Title:NV-Retriever: Improving text embedding models with effective hard-negative mining

Authors:Gabriel de Souza P. Moreira, Radek Osmulski, Mengyao Xu, Ronay Ak, Benedikt Schifferer, Even Oldridge

View PDF HTML (experimental)

Abstract:Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passages for contrastive learning. In this paper we introduce a family of positive-aware mining methods that use the positive relevance score as an anchor for effective false negative removal, leading to faster training and more accurate retrieval models. We provide an ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We further demonstrate the efficacy of our proposed mining methods at scale with the NV-Retriever-v1 model, which scores 60.9 on MTEB Retrieval (BEIR) benchmark and placed 1st when it was published to the MTEB Retrieval on July, 2024.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.15831 [cs.IR]
	(or arXiv:2407.15831v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2407.15831

Submission history

From: Gabriel De Souza Pereira Moreira [view email]
[v1] Mon, 22 Jul 2024 17:50:31 UTC (625 KB)
[v2] Fri, 7 Feb 2025 15:17:18 UTC (1,004 KB)

Computer Science > Information Retrieval

Title:NV-Retriever: Improving text embedding models with effective hard-negative mining

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:NV-Retriever: Improving text embedding models with effective hard-negative mining

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators