Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Cho, Cheol Jun; Mohamed, Abdelrahman; Black, Alan W; Anumanchipalli, Gopala K.

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2310.10788 (eess)

[Submitted on 16 Oct 2023 (v1), last revised 16 Jan 2024 (this version, v2)]

Title:Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Authors:Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

View PDF

Abstract:Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental property of SSL models, i.e., the ability of these models to transform acoustics into the causal articulatory dynamics underlying the speech signal. We also show that this abstraction is largely overlapping across the language of the data used to train the model, with preference to the language with similar phonological system. Furthermore, we show that with simple affine transformations, Acoustic-to-Articulatory inversion (AAI) is transferrable across speakers, even across genders, languages, and dialects, showing the generalizability of this property. Together, these results shed new light on the internals of SSL models that are critical to their superior performance, and open up new avenues into language-agnostic universal models for speech engineering, that are interpretable and grounded in speech science.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
Cite as:	arXiv:2310.10788 [eess.AS]
	(or arXiv:2310.10788v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2310.10788

Submission history

From: Cheol Jun Cho [view email]
[v1] Mon, 16 Oct 2023 19:50:01 UTC (1,542 KB)
[v2] Tue, 16 Jan 2024 08:09:15 UTC (1,542 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators