Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Kuzi, Saar; Zhang, Mingyang; Li, Cheng; Bendersky, Michael; Najork, Marc

Computer Science > Information Retrieval

arXiv:2010.01195 (cs)

[Submitted on 2 Oct 2020]

Title:Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Authors:Saar Kuzi, Mingyang Zhang, Cheng Li, Michael Bendersky, Marc Najork

View PDF

Abstract:Search engines often follow a two-phase paradigm where in the first stage (the retrieval stage) an initial set of documents is retrieved and in the second stage (the re-ranking stage) the documents are re-ranked to obtain the final result list. While deep neural networks were shown to improve the performance of the re-ranking stage in previous works, there is little literature about using deep neural networks to improve the retrieval stage. In this paper, we study the merits of combining deep neural network models and lexical models for the retrieval stage. A hybrid approach, which leverages both semantic (deep neural network-based) and lexical (keyword matching-based) retrieval models, is proposed. We perform an empirical study, using a publicly available TREC collection, which demonstrates the effectiveness of our approach and sheds light on the different characteristics of the semantic approach, the lexical approach, and their combination.

Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2010.01195 [cs.IR]
	(or arXiv:2010.01195v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2010.01195

Submission history

From: Saar Kuzi [view email]
[v1] Fri, 2 Oct 2020 20:59:14 UTC (400 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2020-10

Change to browse by:

cs.IR

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork

export BibTeX citation

Computer Science > Information Retrieval

Title:Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators