Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Csapó, Tamás Gábor

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2008.02098 (eess)

[Submitted on 4 Aug 2020]

Title:Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Authors:Tamás Gábor Csapó

View PDF

Abstract:Acoustic-to-articulatory inversion (AAI) methods estimate articulatory movements from the acoustic speech signal, which can be useful in several tasks such as speech recognition, synthesis, talking heads and language tutoring. Most earlier inversion studies are based on point-tracking articulatory techniques (e.g. EMA or XRMB). The advantage of rtMRI is that it provides dynamic information about the full midsagittal plane of the upper airway, with a high 'relative' spatial resolution. In this work, we estimated midsagittal rtMRI images of the vocal tract for speaker dependent AAI, using MGC-LSP spectral features as input. We applied FC-DNNs, CNNs and recurrent neural networks, and have shown that LSTMs are the most suitable for this task. As objective evaluation we measured normalized MSE, Structural Similarity Index (SSIM) and its complex wavelet version (CW-SSIM). The results indicate that the combination of FC-DNNs and LSTMs can achieve smooth generated MR images of the vocal tract, which are similar to the original MRI recordings (average CW-SSIM: 0.94).

Comments:	5 pages, accepted for publication at Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2008.00889
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2008.02098 [eess.AS]
	(or arXiv:2008.02098v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2008.02098

Submission history

From: Tamás Gábor Csapó [view email]
[v1] Tue, 4 Aug 2020 04:23:03 UTC (1,488 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators