Skip to main content

Showing 1–2 of 2 results for author: Oexle, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.09841  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Anatomy of Industrial Scale Multilingual ASR

    Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

    Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed descriptio… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2404.07341  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

    Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

    Abstract: This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources. To achieve this, we perform Noisy Student Training after generating pseudo-labels for the unlabeled public data using a strong Conformer RNN-T baseline model. The addition of these pseu… ▽ More

    Submitted 12 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.