Skip to main content

Showing 1–4 of 4 results for author: Gisserot-Boukhlef, H

.
  1. arXiv:2503.05500  [pdf, other

    cs.CL cs.AI

    EuroBERT: Scaling Multilingual Encoders for European Languages

    Authors: Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André Martins, Ayoub Hammal, Caio Corro, Céline Hudelot, Emmanuel Malherbe, Etienne Malaboeuf, Fanny Jourdan, Gabriel Hautreux, João Alves, Kevin El-Haddad, Manuel Faysse, Maxime Peyrard, Nuno M. Guerreiro, Patrick Fernandes, Ricardo Rei, Pierre Colombo

    Abstract: General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide applicability, encoders have been recently overshadowed by advances in generative decoder-only models. However, many innovations driving this progress are not inherently tied to decoders. In this paper, we revisit th… ▽ More

    Submitted 26 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 28 pages, 8 figures, 13 tables

  2. arXiv:2502.13595  [pdf, ps, other

    cs.CL cs.AI cs.IR

    MMTEB: Massive Multilingual Text Embedding Benchmark

    Authors: Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa , et al. (61 additional authors not shown)

    Abstract: Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive Multilingual Text Embedding Benchmark (MMTEB) - a large-scale, community-driven expansion of MTEB, covering over 500 quality-controlled evaluation tasks across 250+ langua… ▽ More

    Submitted 8 June, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted for ICLR: https://openreview.net/forum?id=zl3pfz4VCV

  3. arXiv:2409.20059  [pdf, other

    cs.CL

    Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

    Authors: Hippolyte Gisserot-Boukhlef, Ricardo Rei, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo, Nuno M. Guerreiro

    Abstract: Neural metrics for machine translation (MT) evaluation have become increasingly prominent due to their superior correlation with human judgments compared to traditional lexical metrics. Researchers have therefore utilized neural metrics through quality-informed decoding strategies, achieving better results than likelihood-based methods. With the rise of Large Language Models (LLMs), preference-bas… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  4. arXiv:2402.12997  [pdf, other

    cs.IR cs.CL

    Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism

    Authors: Hippolyte Gisserot-Boukhlef, Manuel Faysse, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo

    Abstract: Neural Information Retrieval (NIR) has significantly improved upon heuristic-based Information Retrieval (IR) systems. Yet, failures remain frequent, the models used often being unable to retrieve documents relevant to the user's query. We address this challenge by proposing a lightweight abstention mechanism tailored for real-world constraints, with particular emphasis placed on the reranking pha… ▽ More

    Submitted 25 September, 2024; v1 submitted 20 February, 2024; originally announced February 2024.