Skip to main content

Showing 1–23 of 23 results for author: Dunbar, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.08103  [pdf, ps, other

    cs.CL cs.SD eess.AS

    The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language

    Authors: Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar

    Abstract: We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark corpus designed to push the limits of current approaches to low-resource speech recognition. Faetar, a Franco-Provençal variety spoken primarily in Italy, has no standard orthography, has virtually no existing textual or speech resources other than what is included in the benchmark, and is quite different from other forms… ▽ More

    Submitted 26 May, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: To appear in INTERSPEECH 2025

  2. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  3. Quantifying the Role of Textual Predictability in Automatic Speech Recognition

    Authors: Sean Robertson, Gerald Penn, Ewan Dunbar

    Abstract: A long-standing question in automatic speech recognition research is how to attribute errors to the ability of a model to model the acoustics, versus its ability to leverage higher-order context (lexicon, morphology, syntax, semantics). We validate a novel approach which models error rates as a function of relative textual predictability, and yields a single number, $k$, which measures the effect… ▽ More

    Submitted 5 October, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Proceedings of Interspeech 2024

    Journal ref: Proc. Interspeech 2024, 4029-4033

  4. arXiv:2312.01515  [pdf, other

    cs.CL cs.SD eess.AS

    Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

    Authors: Sean Robertson, Ewan Dunbar

    Abstract: It has been generally assumed in the automatic speech recognition (ASR) literature that it is better for models to have access to wider context windows. Yet, many of the potential reasons this might be true in the supervised setting do not necessarily transfer over to the case of unsupervised learning. We investigate how much context is necessary to achieve high-quality pre-trained acoustic models… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Repository at https://github.com/sdrobert/scpc. This work has been submitted to the IEEE for possible publication

    ACM Class: I.2.7

  5. arXiv:2310.03018  [pdf, other

    eess.AS cs.CL cs.SD

    Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

    Authors: Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

    Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech enco… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024 (v2)

  6. arXiv:2210.15775  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating context-invariance in unsupervised speech representations

    Authors: Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

    Abstract: Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling. Inspiration comes from the promise of ``discovering the phonemes'' of a language or a similar low-bitrate encoding. However, one of the critical properties of phoneme transcriptions is cont… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  7. arXiv:2210.15759  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

    Authors: Ewan Dunbar, Nicolas Hamilakis, Emmanuel Dupoux

    Abstract: Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Journal ref: Journal: IEEE Journal of Selected Topics in Signal Processing Publication Date: OCTOBER 2022 Volume: 16, Issue: 6 On Page(s): 1211-1226 Print ISSN: 1932-4553 Online ISSN: 1941-0484 Digital Object Identifier: 10.1109/JSTSP.2022.3206084

  8. arXiv:2210.02956  [pdf, other

    cs.CL

    Are word boundaries useful for unsupervised language learning?

    Authors: Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

    Abstract: Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least two kinds of relevant information: boundary information and meaningful units. However, word boundary information may be absent or unreliable in the case… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: This is an archived version from September 2020

  9. arXiv:2206.01685  [pdf, other

    q-bio.NC cs.AI cs.CL

    Toward a realistic model of speech processing in the brain with self-supervised learning

    Authors: Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

    Abstract: Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual wor… ▽ More

    Submitted 20 March, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

    Journal ref: Neural Information Processing Systems (NeurIPS), 2022

  10. arXiv:2205.15823  [pdf, other

    cs.CL cs.SD eess.AS

    Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

    Authors: Juliette Millet, Ioana Chitoran, Ewan Dunbar

    Abstract: Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2021. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 661-673, Online. Association for Computational Linguistics

  11. arXiv:2205.15819  [pdf, other

    cs.CL cs.SD eess.AS

    Do self-supervised speech models develop human-like perception biases?

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2022. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7591-7605, Dublin, Ireland. Association for Computational Linguistics

  12. arXiv:2104.14700  [pdf, ps, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2021: Spoken language modelling

    Authors: Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (C… ▽ More

    Submitted 9 August, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588

  13. arXiv:2102.11749  [pdf, other

    cs.CL cs.AI

    Paraphrases do not explain word analogies

    Authors: Louis Fournier, Ewan Dunbar

    Abstract: Many types of distributional word embeddings (weakly) encode linguistic regularities as directions (the difference between "jump" and "jumped" will be in a similar direction to that of "walk" and "walked," and so on). Several attempts have been made to explain this fact. We respond to Allen and Hospedales' recent (ICML, 2019) theoretical explanation, which claims that word2vec and GloVe will encod… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

  14. arXiv:2011.11588  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

    Authors: Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

    Abstract: We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a com… ▽ More

    Submitted 1 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 14 pages, including references and supplementary material

  15. arXiv:2010.05967  [pdf, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

    Authors: Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speec… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  16. arXiv:2010.05961  [pdf, other

    cs.CL cs.AI

    Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  17. arXiv:2010.03446  [pdf, other

    cs.CL cs.AI

    Analogies minus analogy test: measuring regularities in word embeddings

    Authors: Louis Fournier, Emmanuel Dupoux, Ewan Dunbar

    Abstract: Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar direction… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of CoNLL 2020

  18. arXiv:2005.03418  [pdf, other

    cs.CL cs.SD eess.AS

    The Perceptimatic English Benchmark for Speech Perception Models

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted to CogSci Conference 2020

  19. arXiv:1911.06573  [pdf, other

    eess.AS cs.CL cs.SD

    Independent and automatic evaluation of acoustic-to-articulatory inversion models

    Authors: Maud Parrot, Juliette Millet, Ewan Dunbar

    Abstract: Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. However, to be useful in these settings, articulatory reconstruction must be speaker independent. Furthermore, as most research focuses on single, small datasets with few speakers, robust articulatory reconstrucion could profit from combining… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: 5 pages, 1 figure

  20. arXiv:1904.11469  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Challenge 2019: TTS without T

    Authors: Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery datase… ▽ More

    Submitted 7 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019

  21. arXiv:1812.08718  [pdf, other

    cs.CL

    RNNs Implicitly Implement Tensor Product Representations

    Authors: R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

    Abstract: Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies). Such regularities motivate our hypothesis that RNNs that show such regularities implicitly compile symbolic structures into tensor product representations (TPRs; Smolensky, 1990), which additively c… ▽ More

    Submitted 5 March, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: Accepted to ICLR 2019

  22. arXiv:1712.04313  [pdf, ps, other

    cs.CL

    The Zero Resource Speech Challenge 2017

    Authors: Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux

    Abstract: We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japan

  23. arXiv:1704.06913  [pdf, other

    cs.CL cs.LG

    Learning weakly supervised multimodal phoneme embeddings

    Authors: Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

    Abstract: Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In par… ▽ More

    Submitted 18 October, 2017; v1 submitted 23 April, 2017; originally announced April 2017.