Skip to main content

Showing 1–3 of 3 results for author: Chevallet, J

.
  1. arXiv:2004.11759  [pdf, other

    cs.IR

    Learning Term Discrimination

    Authors: Jibril Frej, Phillipe Mulhem, Didier Schwab, Jean-Pierre Chevallet

    Abstract: Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the importance of a term in a document), traditional IR models use term discrimination values (TDVs) such as inverse document frequency (idf) to favor discriminative t… ▽ More

    Submitted 28 April, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: Accepted to ACM SIGIR 2020

  2. arXiv:1912.01901  [pdf, other

    cs.IR

    WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

    Authors: Jibril Frej, Didier Schwab, Jean-Pierre Chevallet

    Abstract: Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for… ▽ More

    Submitted 17 March, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted at LREC 2020

    MSC Class: H.3.3 ACM Class: H.3.3

  3. arXiv:1801.03844  [pdf, ps, other

    cs.IR

    Enhancing Translation Language Models with Word Embedding for Information Retrieval

    Authors: Jibril Frej, Jean-Pierre Chevallet, Didier Schwab

    Abstract: In this paper, we explore the usage of Word Embedding semantic resources for Information Retrieval (IR) task. This embedding, produced by a shallow neural network, have been shown to catch semantic similarities between words (Mikolov et al., 2013). Hence, our goal is to enhance IR Language Models by addressing the term mismatch problem. To do so, we applied the model presented in the paper Integra… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.