emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Ma, Ziyang; Zheng, Zhisheng; Ye, Jiaxin; Li, Jinchao; Gao, Zhifu; Zhang, Shiliang; Chen, Xie

Computer Science > Computation and Language

arXiv:2312.15185 (cs)

[Submitted on 23 Dec 2023]

Title:emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Authors:Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

View PDF HTML (experimental)

Abstract:We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and frame-level loss during pre-training. emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speech emotion recognition task on the mainstream IEMOCAP dataset. In addition, emotion2vec shows consistent improvements among 10 different languages of speech emotion recognition datasets. emotion2vec also shows excellent results on other emotion tasks, such as song emotion recognition, emotion prediction in conversation, and sentiment analysis. Comparison experiments, ablation experiments, and visualization comprehensively demonstrate the universal capability of the proposed emotion2vec. To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Comments:	Code, checkpoints, and extracted features are available at this https URL
Subjects:	Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2312.15185 [cs.CL]
	(or arXiv:2312.15185v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.15185

Submission history

From: Ziyang Ma [view email]
[v1] Sat, 23 Dec 2023 07:46:55 UTC (732 KB)

Computer Science > Computation and Language

Title:emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators