Skip to main content

Showing 1–12 of 12 results for author: Alastruey, B

.
  1. arXiv:2412.08821  [pdf, other

    cs.CL

    Large Concept Models: Language Modeling in a Sentence Representation Space

    Authors: LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk

    Abstract: LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper,… ▽ More

    Submitted 15 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    Comments: 49 pages

  2. arXiv:2412.08274  [pdf, other

    cs.CL

    2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

    Authors: Marta R. Costa-jussà, Bokai Yu, Pierre Andrews, Belen Alastruey, Necati Cihan Camgoz, Joe Chuang, Jean Maillard, Christophe Ropers, Arina Turkantenko, Carleigh Wood

    Abstract: We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared… ▽ More

    Submitted 23 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

    ACM Class: I.2.7

  3. arXiv:2409.18044  [pdf, other

    cs.CL

    Unveiling the Role of Pretraining in Direct Speech Translation

    Authors: Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this study, we compare the training dynamics of a system using a pretrained encoder, the conventional approach, and one trained from scratch. We observe that, throughout… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024

  4. arXiv:2409.12126  [pdf, other

    cs.CL

    Linguini: A benchmark for language-agnostic linguistic reasoning

    Authors: Eduardo Sánchez, Belen Alastruey, Christophe Ropers, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

    Abstract: We propose a new benchmark to measure a language model's linguistic reasoning skills without relying on pre-existing language-specific knowledge. The test covers 894 questions grouped in 160 problems across 75 (mostly) extremely low-resource languages, extracted from the International Linguistic Olympiad corpus. To attain high accuracy on this benchmark, models don't need previous knowledge of the… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  5. arXiv:2310.12648  [pdf, other

    cs.CL

    Towards Real-World Streaming Speech Translation for Code-Switched Speech

    Authors: Belen Alastruey, Matthias Sperber, Christian Gollan, Dominic Telaar, Tim Ng, Aashish Agarwal

    Abstract: Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\t… ▽ More

    Submitted 23 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

  6. arXiv:2309.11585  [pdf, other

    cs.CL

    SpeechAlign: a Framework for Speech Translation Alignment Evaluation

    Authors: Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà

    Abstract: Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. In our commitment to advance these fields, we present SpeechAlign, a framework designed to evaluate the underexplored field of source-target alignment in speech models. The SpeechAlign framework has two core components. First, to tackle the absence of suitable evaluation datasets, we introduce the Speech Gold… ▽ More

    Submitted 25 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: LREC-COLING 2024

  7. arXiv:2308.16871  [pdf, other

    cs.CL cs.AI

    The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

    Authors: Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà

    Abstract: Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting wil… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 15 pages

  8. arXiv:2306.06954  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

    Authors: Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler

    Abstract: Convolutional frontends are a typical choice for Transformer-based automatic speech recognition to preprocess the spectrogram, reduce its sequence length, and combine local information in time and frequency similarly. However, the width and height of an audio spectrogram denote different information, e.g., due to reverberation as well as the articulatory system, the time axis has a clear left-to-r… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  9. arXiv:2205.11631  [pdf, other

    cs.CL

    Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

    Authors: Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà

    Abstract: In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target… ▽ More

    Submitted 4 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  10. arXiv:2205.07100  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

    Authors: Gerard Sant, Gerard I. Gállego, Belen Alastruey, Marta R. Costa-Jussà

    Abstract: Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as long sequence lengths and redundancy between adjacent tokens. Therefore, we believe that regular self-attention mechanism might not be well suited for it. Diffe… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: NAACL-SRW 2022

  11. arXiv:2204.09028  [pdf, other

    cs.CL cs.SD eess.AS

    On the Locality of Attention in Direct Speech Translation

    Authors: Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contribution… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrando

  12. arXiv:2107.03069  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Transformer for Direct Speech Translation

    Authors: Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: The advent of Transformer-based models has surpassed the barriers of text. When working with speech, we must face a problem: the sequence length of an audio input is not suitable for the Transformer. To bypass this problem, a usual approach is adding strided convolutional layers, to reduce the sequence length before using the Transformer. In this paper, we propose a new approach for direct Speech… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works