Skip to main content

Showing 1–4 of 4 results for author: Sarawagi, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16542  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    SALSA: Speedy ASR-LLM Synchronous Aggregation

    Authors: Ashish Mittal, Darshan Prabhu, Sunita Sarawagi, Preethi Jyothi

    Abstract: Harnessing pre-trained LLMs to improve ASR systems, particularly for low-resource languages, is now an emerging area of research. Existing methods range from using LLMs for ASR error correction to tightly coupled systems that replace the ASR decoder with the LLM. These approaches either increase decoding time or require expensive training of the cross-attention layers. We propose SALSA, which coup… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to INTERSPEECH 2024

  2. arXiv:2307.05006  [pdf, ps, other

    cs.CL cs.LG eess.AS

    Improving RNN-Transducers with Acoustic LookAhead

    Authors: Vinit S. Unni, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi

    Abstract: RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to s… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: 5 pages, 1 fig, 7 tables, Proceedings of Interspeech 2023

  3. arXiv:2103.03142  [pdf, other

    cs.SD cs.CL eess.AS

    Error-driven Fixed-Budget ASR Personalization for Accented Speakers

    Authors: Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi

    Abstract: We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speaker-specific data to learn phoneme-level error models which… ▽ More

    Submitted 2 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: In ICASSP 2021

  4. arXiv:2006.13519  [pdf, other

    eess.AS cs.CL cs.SD

    Black-box Adaptation of ASR for Accented Speech

    Authors: Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi

    Abstract: We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent. While leading online ASR services obtain impressive performance on main-stream accents, they perform poorly on sub-populations - we observed that the word error rate (WER) achieved by Google's ASR API on Indian accents is almost twice the WER on US accents. Existing adaptation methods either re… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: A slightly different version submitted to INTERSPEECH 2020 (currently under review)