Speaker Normalization for Self-supervised Speech Emotion Recognition

Gat, Itai; Aronowitz, Hagai; Zhu, Weizhong; Morais, Edmilson; Hoory, Ron

Computer Science > Machine Learning

arXiv:2202.01252 (cs)

[Submitted on 2 Feb 2022 (v1), last revised 6 Nov 2022 (this version, v2)]

Title:Speaker Normalization for Self-supervised Speech Emotion Recognition

Authors:Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory

View PDF

Abstract:Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

Comments:	ICASSP 22
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.01252 [cs.LG]
	(or arXiv:2202.01252v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.01252

Submission history

From: Itai Gat [view email]
[v1] Wed, 2 Feb 2022 19:30:47 UTC (90 KB)
[v2] Sun, 6 Nov 2022 08:21:02 UTC (90 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2022-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ron Hoory

export BibTeX citation

Computer Science > Machine Learning

Title:Speaker Normalization for Self-supervised Speech Emotion Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Speaker Normalization for Self-supervised Speech Emotion Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators