Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Weise, Tobias; Klumpp, Philipp; Demir, Kubilay Can; Maier, Andreas; Noeth, Elmar; Heismann, Bjoern; Schuster, Maria; Yang, Seung Hee

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2204.04016 (eess)

[Submitted on 8 Apr 2022 (v1), last revised 27 Jun 2022 (this version, v2)]

Title:Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Authors:Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

View PDF

Abstract:Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.

Comments:	Submitted and Accepted at INTERSPEECH2022
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2204.04016 [eess.AS]
	(or arXiv:2204.04016v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2204.04016

Submission history

From: Tobias Weise [view email]
[v1] Fri, 8 Apr 2022 12:02:14 UTC (1,329 KB)
[v2] Mon, 27 Jun 2022 14:21:23 UTC (1,332 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators