Skip to main content

Showing 1–4 of 4 results for author: Shankar, N B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.18463  [pdf, ps, other

    eess.AS

    CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR

    Authors: Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, Abeer Alwan

    Abstract: Automatic Speech Recognition (ASR) systems struggle with child speech due to its distinct acoustic and linguistic variability and limited availability of child speech datasets, leading to high transcription error rates. While ASR error correction (AEC) methods have improved adult speech transcription, their effectiveness on child speech remains largely unexplored. To address this, we introduce CHS… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Accepted in Interspeech 2025

  2. arXiv:2501.08468  [pdf, other

    cs.CL cs.SD eess.AS

    Selective Attention Merging for low resource tasks: A case study of Child ASR

    Authors: Natarajan Balaji Shankar, Zilai Wang, Eray Eren, Abeer Alwan

    Abstract: While Speech Foundation Models (SFMs) excel in various speech tasks, their performance for low-resource tasks such as child Automatic Speech Recognition (ASR) is hampered by limited pretraining data. To address this, we explore different model merging techniques to leverage knowledge from models trained on larger, more diverse speech corpora. This paper also introduces Selective Attention (SA) Mer… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: To appear in ICASSP 2025

  3. arXiv:2406.10512  [pdf, other

    eess.AS cs.SD

    SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR

    Authors: Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan

    Abstract: Recently, speech foundation models have gained popularity due to their superiority in finetuning downstream ASR tasks. However, models finetuned on certain domains, such as LibriSpeech (adult read speech), behave poorly on other domains (child or noisy speech). One solution could be collecting as much labeled and diverse data as possible for joint finetuning on various domains. However, collecting… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to ICASSP 2024 SASB Workshop

  4. arXiv:2406.10507  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

    Authors: Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan

    Abstract: Speech foundation models (SFMs) have achieved state-of-the-art results for various speech tasks in supervised (e.g. Whisper) or self-supervised systems (e.g. WavLM). However, the performance of SFMs for child ASR has not been systematically studied. In addition, there is no benchmark for child ASR with standard evaluations, making the comparisons of novel ideas difficult. In this paper, we initiat… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in Interspeech 2024