Skip to main content

Showing 1–5 of 5 results for author: Galvez, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.05931  [pdf, other

    cs.CL eess.AS

    Training and Inference Efficiency of Encoder-Decoder Speech Models

    Authors: Piotr Żelasko, Kunal Dhawan, Daniel Galvez, Krishna C. Puvvada, Ankita Pasad, Nithin Rao Koluguri, Ke Hu, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Attention encoder-decoder model architecture is the backbone of several recent top performing foundation speech models: Whisper, Seamless, OWSM, and Canary-1B. However, the reported data and compute requirements for their training are prohibitive for many in the research community. In this work, we focus on the efficiency angle and ask the questions of whether we are training these speech models e… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  2. arXiv:2409.13523  [pdf, other

    cs.CL cs.SD eess.AS

    EMMeTT: Efficient Multimodal Machine Translation Training

    Authors: Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: A rising interest in the modality extension of foundation language models warrants discussion on the most effective, and efficient, multimodal training approach. This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST). We investigate two different foundation model architectures, decoder-only G… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 4 pages, submitted to ICASSP 2025

  3. arXiv:2406.06220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Label-Looping: Highly Efficient Decoding for Transducers

    Authors: Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special str… ▽ More

    Submitted 16 September, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at IEEE SLT 2024

  4. arXiv:2311.04996  [pdf, other

    eess.AS cs.LG

    GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

    Authors: Daniel Galvez, Tim Kaldewey

    Abstract: While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder compatible with current CTC models. It increases pipeline throughput and decreases latency, support… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  5. arXiv:1711.08058  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

    Authors: Ahmad AbdulKader, Kareem Nassar, Mohamed El-Geish, Daniel Galvez, Chetan Patil

    Abstract: We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments -- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class… ▽ More

    Submitted 25 April, 2025; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: Published in the proceedings of NeurIPS 2017 Workshop: Machine Learning on the Phone and other Consumer Devices