Skip to main content

Showing 1–2 of 2 results for author: Jogi, Y

.
  1. arXiv:2502.13446  [pdf, other

    eess.AS cs.LG

    Adopting Whisper for Confidence Estimation

    Authors: Vaibhav Aggarwal, Shabari S Nair, Yash Verma, Yash Jogi

    Abstract: Recent research on word-level confidence estimation for speech recognition systems has primarily focused on lightweight models known as Confidence Estimation Modules (CEMs), which rely on hand-engineered features derived from Automatic Speech Recognition (ASR) outputs. In contrast, we propose a novel end-to-end approach that leverages the ASR model itself (Whisper) to generate word-level confidenc… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted at IEEE ICASSP 2025

  2. arXiv:2502.11572  [pdf, other

    eess.AS cs.SD

    Improving Rare-Word Recognition of Whisper in Zero-Shot Settings

    Authors: Yash Jogi, Vaibhav Aggarwal, Shabari S Nair, Yash Verma, Aayush Kubba

    Abstract: Whisper, despite being trained on 680K hours of web-scaled audio data, faces difficulty in recognising rare words like domain-specific terms, with a solution being contextual biasing through prompting. To improve upon this method, in this paper, we propose a supervised learning strategy to fine-tune Whisper for contextual biasing instruction. We demonstrate that by using only 670 hours of Common V… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Accepted at IEEE SLT 2024