Skip to main content

Showing 1–4 of 4 results for author: Pandey, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.12501  [pdf, other

    eess.AS cs.SD

    A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data

    Authors: Minh Tran, Yutong Pang, Debjyoti Paul, Laxmi Pandey, Kevin Jiang, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei

    Abstract: We introduce DAS (Domain Adaptation with Synthetic data), a novel domain adaptation framework for pre-trained ASR model, designed to efficiently adapt to various language-defined domains without requiring any real data. In particular, DAS first prompts large language models (LLMs) to generate domain-specific texts before converting these texts to speech via text-to-speech technology. The synthetic… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  2. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2207.09674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Data Driven Inverse Text Normalization using Data Augmentation

    Authors: Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf

    Abstract: Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  4. arXiv:2106.08706  [pdf, other

    eess.IV cs.CV cs.HC cs.LG cs.SD eess.AS

    Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

    Authors: Laxmi Pandey, Ahmed Sabbir Arif

    Abstract: Speech sounds of spoken language are obtained by varying configuration of the articulators surrounding the vocal tract. They contain abundant information that can be utilized to better understand the underlying mechanism of human speech production. We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract sh… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 8 pages

    ACM Class: I.4.9; I.2.10