Recent Progress in the CUHK Dysarthric Speech Recognition System

Liu, Shansong; Geng, Mengzhe; Hu, Shoukang; Xie, Xurong; Cui, Mingyu; Yu, Jianwei; Liu, Xunying; Meng, Helen

doi:10.1109/TASLP.2021.3091805

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2201.05845 (eess)

[Submitted on 15 Jan 2022 (v1), last revised 26 Feb 2022 (this version, v2)]

Title:Recent Progress in the CUHK Dysarthric Speech Recognition System

Authors:Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

View PDF

Abstract:Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based ASR technologies that predominantly target normal speech. This paper presents recent research efforts at the Chinese University of Hong Kong (CUHK) to improve the performance of disordered speech recognition systems on the largest publicly available UASpeech dysarthric speech corpus. A set of novel modelling techniques including neural architectural search, data augmentation using spectra-temporal perturbation, model based speaker adaptation and cross-domain generation of visual features within an audio-visual speech recognition (AVSR) system framework were employed to address the above challenges. The combination of these techniques produced the lowest published word error rate (WER) of 25.21% on the UASpeech test set 16 dysarthric speakers, and an overall WER reduction of 5.4% absolute (17.6% relative) over the CUHK 2018 dysarthric speech recognition system featuring a 6-way DNN system combination and cross adaptation of out-of-domain normal speech data trained systems. Bayesian model adaptation further allows rapid adaptation to individual dysarthric speakers to be performed using as little as 3.06 seconds of speech. The efficacy of these techniques were further demonstrated on a CUDYS Cantonese dysarthric speech recognition task.

Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2201.05845 [eess.AS]
	(or arXiv:2201.05845v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2201.05845
Related DOI:	https://doi.org/10.1109/TASLP.2021.3091805

Submission history

From: Mengzhe Geng [view email]
[v1] Sat, 15 Jan 2022 13:02:40 UTC (425 KB)
[v2] Sat, 26 Feb 2022 05:00:31 UTC (425 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Recent Progress in the CUHK Dysarthric Speech Recognition System

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Recent Progress in the CUHK Dysarthric Speech Recognition System

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators