Skip to main content

Showing 1–6 of 6 results for author: Dinarelli, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.05966  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    Towards Early Prediction of Self-Supervised Speech Model Performance

    Authors: Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève

    Abstract: In Self-Supervised Learning (SSL), pre-training and evaluation are resource intensive. In the speech domain, current indicators of the quality of SSL models during pre-training, such as the loss, do not correlate well with downstream performance. Consequently, it is often difficult to gauge the final downstream performance in a cost efficient manner during pre-training. In this work, we propose un… ▽ More

    Submitted 31 May, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

  2. arXiv:2409.02596  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    An Analysis of Linear Complexity Attention Substitutes with BEST-RQ

    Authors: Ryan Whetten, Titouan Parcollet, Adel Moumen, Marco Dinarelli, Yannick Estève

    Abstract: Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHSA). Alternatives for MHSA have been proposed and used in the speech domain, but have yet to be investigated properly in an SSL setting. In this work, we study th… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted in the IEEE Soken Language Technology Workshop 2024

  3. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  4. arXiv:2207.00352  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Toward Low-Cost End-to-End Spoken Language Understanding

    Authors: Marco Dinarelli, Marco Naguib, François Portet

    Abstract: Recent advances in spoken language understanding benefited from Self-Supervised models trained on large speech corpora. For French, the LeBenchmark project has made such models available and has led to impressive progress on several tasks including spoken language understanding. These advances have a non-negligible cost in terms of computation time and energy consumption. In this paper, we compare… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at Interspeech 2022; Slightly improved (longer) version

  5. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  6. arXiv:2002.05955  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    A Data Efficient End-To-End Spoken Language Understanding Architecture

    Authors: Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier

    Abstract: End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very good results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy.… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020