Learning Term Discrimination

Frej, Jibril; Mulhem, Phillipe; Schwab, Didier; Chevallet, Jean-Pierre

Computer Science > Information Retrieval

arXiv:2004.11759 (cs)

[Submitted on 24 Apr 2020 (v1), last revised 28 Apr 2020 (this version, v3)]

Title:Learning Term Discrimination

Authors:Jibril Frej, Phillipe Mulhem, Didier Schwab, Jean-Pierre Chevallet

View PDF

Abstract:Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the importance of a term in a document), traditional IR models use term discrimination values (TDVs) such as inverse document frequency (idf) to favor discriminative terms during retrieval. In this work, we propose to learn TDVs for document indexing with shallow neural networks that approximate traditional IR ranking functions such as TF-IDF and BM25. Our proposal outperforms, both in terms of nDCG and recall, traditional approaches, even with few positively labelled query-document pairs as learning data. Our learned TDVs, when used to filter out terms of the vocabulary that have zero discrimination value, allow to both significantly lower the memory footprint of the inverted index and speed up the retrieval process (BM25 is up to 3~times faster), without degrading retrieval quality.

Comments:	Accepted to ACM SIGIR 2020
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2004.11759 [cs.IR]
	(or arXiv:2004.11759v3 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2004.11759

Submission history

From: Jibril Frej [view email]
[v1] Fri, 24 Apr 2020 14:00:50 UTC (961 KB)
[v2] Mon, 27 Apr 2020 07:59:16 UTC (961 KB)
[v3] Tue, 28 Apr 2020 08:15:04 UTC (961 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2020-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jibril Frej
Philippe Mulhem
Didier Schwab
Jean-Pierre Chevallet

export BibTeX citation

Computer Science > Information Retrieval

Title:Learning Term Discrimination

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Learning Term Discrimination

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators