Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Attia, Ahmed Adel; Liu, Jing; Ai, Wei; Demszky, Dorottya; Espy-Wilson, Carol

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2309.07927 (eess)

[Submitted on 12 Sep 2023 (v1), last revised 15 May 2024 (this version, v3)]

Title:Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Authors:Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

View PDF HTML (experimental)

Abstract:Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2309.07927 [eess.AS]
	(or arXiv:2309.07927v3 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2309.07927

Submission history

From: Ahmed Attia [view email]
[v1] Tue, 12 Sep 2023 06:58:18 UTC (44 KB)
[v2] Mon, 18 Sep 2023 09:56:20 UTC (181 KB)
[v3] Wed, 15 May 2024 07:05:32 UTC (2,012 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators