Skip to main content

Showing 1–6 of 6 results for author: Serai, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11258  [pdf, other

    cs.AI cs.CL

    Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

    Authors: Prashant Serai, Peidong Wang, Eric Fosler-Lussier

    Abstract: Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or even no audio data is available at train time. Previous work typically considered replicating behavior of GMM-HMM based systems, but the behavior of more modern po… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of IEEE ICASSP 2019

  2. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Zhiping Xiu, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  3. arXiv:2303.00802  [pdf, other

    cs.CL cs.SD eess.AS

    Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

    Authors: Philipp Klumpp, Pooja Chitkara, Leda Sarı, Prashant Serai, Jilong Wu, Irina-Elena Veliche, Rongqing Huang, Qing He

    Abstract: The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  4. arXiv:2211.13282  [pdf, other

    cs.SD cs.AI eess.AS

    Voice-preserving Zero-shot Multiple Accent Conversion

    Authors: Mumin Jin, Prashant Serai, Jilong Wu, Andros Tjandra, Vimal Manohar, Qing He

    Abstract: Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range… ▽ More

    Submitted 14 October, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE ICASSP 2023

  5. arXiv:2204.05183  [pdf, other

    cs.CL cs.SD eess.AS

    Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

    Authors: Vishal Sunder, Prashant Serai, Eric Fosler-Lussier

    Abstract: A Virtual Patient (VP) is a powerful tool for training medical students to take patient histories, where responding to a diverse set of spoken questions is essential to simulate natural conversations with a student. The performance of such a Spoken Language Understanding system (SLU) can be adversely affected by both the presence of Automatic Speech Recognition (ASR) errors in the test data and a… ▽ More

    Submitted 1 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 3 figures

  6. arXiv:2103.12258  [pdf, other

    cs.CL cs.LG

    Hallucination of speech recognition errors with sequence to sequence learning

    Authors: Prashant Serai, Vishal Sunder, Eric Fosler-Lussier

    Abstract: Automatic Speech Recognition (ASR) is an imperfect process that results in certain mismatches in ASR output text when compared to plain written text or transcriptions. When plain text data is to be used to train systems for spoken language understanding or ASR, a proven strategy to reduce said mismatch and prevent degradations, is to hallucinate what the ASR outputs would be given a gold transcrip… ▽ More

    Submitted 31 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Submitted to IEEE/ACM Transactions on Audio Speech and Language Processing