Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Abdullah, Badr M.; Möbius, Bernd; Klakow, Dietrich

Computer Science > Computation and Language

arXiv:2209.06633 (cs)

[Submitted on 14 Sep 2022 (v1), last revised 18 Sep 2022 (this version, v2)]

Title:Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Authors:Badr M. Abdullah, Bernd Möbius, Dietrich Klakow

View PDF

Abstract:Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.

Comments:	Accepted in INTERSPEECH 2022
Subjects:	Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2209.06633 [cs.CL]
	(or arXiv:2209.06633v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.06633

Submission history

From: Badr M. Abdullah [view email]
[v1] Wed, 14 Sep 2022 13:33:04 UTC (3,484 KB)
[v2] Sun, 18 Sep 2022 15:21:56 UTC (3,484 KB)

Computer Science > Computation and Language

Title:Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators