Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Ting, Zhu; Liangqi, Li; Shufei, Duan; Xueying, Zhang; Zhongzhe, Xiao; Hairng, Jia; Huizhi, Liang

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2401.07336 (eess)

[Submitted on 14 Jan 2024]

Title:Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Authors:Zhu Ting, Li Liangqi, Duan Shufei, Zhang Xueying, Xiao Zhongzhe, Jia Hairng, Liang Huizhi

View PDF

Abstract:A multi-modal emotional speech Mandarin database including articulatory kinematics, acoustics, glottal and facial micro-expressions is designed and established, which is described in detail from the aspects of corpus design, subject selection, recording details and data processing. Where signals are labeled with discrete emotion labels (neutral, happy, pleasant, indifferent, angry, sad, grief) and dimensional emotion labels (pleasure, arousal, dominance). In this paper, the validity of dimension annotation is verified by statistical analysis of dimension annotation data. The SCL-90 scale data of annotators are verified and combined with PAD annotation data for analysis, so as to explore the internal relationship between the outlier phenomenon in annotation and the psychological state of annotators. In order to verify the speech quality and emotion discrimination of the database, this paper uses 3 basic models of SVM, CNN and DNN to calculate the recognition rate of these seven emotions. The results show that the average recognition rate of seven emotions is about 82% when using acoustic data alone. When using glottal data alone, the average recognition rate is about 72%. Using kinematics data alone, the average recognition rate also reaches 55.7%. Therefore, the database is of high quality and can be used as an important source for speech analysis research, especially for the task of multimodal emotional speech analysis.

Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2401.07336 [eess.AS]
	(or arXiv:2401.07336v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2401.07336

Submission history

From: Ting Zhu [view email]
[v1] Sun, 14 Jan 2024 17:56:36 UTC (2,086 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators