MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech

Rejaibi, Emna; Komaty, Ali; Meriaudeau, Fabrice; Agrebi, Said; Othmani, Alice

Computer Science > Human-Computer Interaction

arXiv:1909.07208 (cs)

[Submitted on 16 Sep 2019 (v1), last revised 12 Mar 2020 (this version, v2)]

Title:MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech

Authors:Emna Rejaibi, Ali Komaty, Fabrice Meriaudeau, Said Agrebi, Alice Othmani

View PDF

Abstract:Clinical depression or Major Depressive Disorder (MDD) is a common and serious medical illness. In this paper, a deep recurrent neural network-based framework is presented to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, expanding training labels and transferred features are considered. The proposed approach outperforms the state-of-art approaches on the DAIC-WOZ database with an overall accuracy of 76.27% and a root mean square error of 0.4 in assessing depression, while a root mean square error of 0.168 is achieved in predicting the depression severity levels. The proposed framework has several advantages (fastness, non-invasiveness, and non-intrusion), which makes it convenient for real-time applications. The performances of the proposed approach are evaluated under a multi-modal and a multi-features experiments. MFCC based high-level features hold relevant information related to depression. Yet, adding visual action units and different other acoustic features further boosts the classification results by 20% and 10% to reach an accuracy of 95.6% and 86%, respectively. Considering visual-facial modality needs to be carefully studied as it sparks patient privacy concerns while adding more acoustic features increases the computation time.

Comments:	14 pages, 7 figures, 9 tables
Subjects:	Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1909.07208 [cs.HC]
	(or arXiv:1909.07208v2 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.1909.07208

Submission history

From: Emna Rejaibi [view email]
[v1] Mon, 16 Sep 2019 14:03:01 UTC (170 KB)
[v2] Thu, 12 Mar 2020 13:09:24 UTC (247 KB)

Computer Science > Human-Computer Interaction

Title:MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators