Skip to main content

Showing 1–3 of 3 results for author: Janeiro, J M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.08821  [pdf, other

    cs.CL

    Large Concept Models: Language Modeling in a Sentence Representation Space

    Authors: LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk

    Abstract: LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper,… ▽ More

    Submitted 15 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: 49 pages

  2. arXiv:2409.12737  [pdf, other

    cs.CL cs.AI

    MEXMA: Token-level objectives improve sentence representations

    Authors: João Maria Janeiro, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault

    Abstract: Current pre-trained cross-lingual sentence encoders approaches use sentence-level objectives only. This can lead to loss of information, especially for tokens, which then degrades the sentence representation. We propose MEXMA, a novel approach that integrates both sentence-level and token-level objectives. The sentence representation in one language is used to predict masked tokens in another lang… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: 11 pages, 12 figures

  3. arXiv:2304.04518  [pdf, other

    cs.CV cs.AI cs.LG

    Are Visual Recognition Models Robust to Image Compression?

    Authors: João Maria Janeiro, Stanislav Frolov, Alaaeldin El-Nouby, Jakob Verbeek

    Abstract: Reducing the data footprint of visual content via image compression is essential to reduce storage requirements, but also to reduce the bandwidth and latency requirements for transmission. In particular, the use of compressed images allows for faster transfer of data, and faster response times for visual recognition in edge devices that rely on cloud-based services. In this paper, we first analyze… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.