Skip to main content

Showing 1–5 of 5 results for author: Thorbecke, I

.
  1. arXiv:2411.03866  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

    Authors: Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke

    Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In th… ▽ More

    Submitted 22 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Accepted in ICASSP 2025 SALMA Workshop

    Journal ref: Proc. ICASSP Workshop on Speech and Audio Language Models (SALMA), 2025

  2. arXiv:2409.13514  [pdf, other

    cs.CL cs.SD eess.AS

    LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR

    Authors: Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Andres Carofilis, Shashi Kumar, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special entities leads to a degradation in the overall performance. We propose a light on-the-fly method to improve automatic speech recognition performance by… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP2025

  3. arXiv:2409.13499  [pdf, other

    cs.CL cs.SD eess.AS

    Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper

    Authors: Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Shashi Kumar, Pradeep Rangappa, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM). This allows training a robust ASR model just in one stage and… ▽ More

    Submitted 7 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP Findings 2024

  4. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Thorbecke, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at EMNLP 2024 (Main Conference)

  5. arXiv:2407.04439  [pdf, other

    eess.AS

    XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Thorbecke, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju

    Abstract: Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our exper… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column