Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Hu, Shujie; Liu, Shansong; Xie, Xurong; Geng, Mengzhe; Wang, Tianzi; Hu, Shoukang; Cui, Mingyu; Liu, Xunying; Meng, Helen

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2203.10274 (eess)

[Submitted on 19 Mar 2022]

Title:Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Authors:Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

View PDF

Abstract:Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application to disordered speech recognition is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training before being cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features. Mixture density networks based neural A2A inversion models were used. A cross-domain feature adaptation network was also used to reduce the acoustic mismatch between the TORGO and UASpeech data. On both tasks, incorporating the A2A generated articulatory features consistently outperformed the baseline hybrid DNN/TDNN, CTC and Conformer based end-to-end systems constructed using acoustic features only. The best multi-modal system incorporating video modality and the cross-domain articulatory features as well as data augmentation and learning hidden unit contributions (LHUC) speaker adaptation produced the lowest published word error rate (WER) of 24.82% on the 16 dysarthric speakers of the benchmark UASpeech task.

Comments:	accepted by ICASSP 2022
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2203.10274 [eess.AS]
	(or arXiv:2203.10274v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2203.10274

Submission history

From: Shujie Hu [view email]
[v1] Sat, 19 Mar 2022 08:47:18 UTC (892 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators