Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model

Seneviratne, Nadee; Espy-Wilson, Carol

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2104.04195 (eess)

[Submitted on 9 Apr 2021]

Title:Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model

Authors:Nadee Seneviratne, Carol Espy-Wilson

View PDF

Abstract:Speech based depression classification has gained immense popularity over the recent years. However, most of the classification studies have focused on binary classification to distinguish depressed subjects from non-depressed subjects. In this paper, we formulate the depression classification task as a severity level classification problem to provide more granularity to the classification outcomes. We use articulatory coordination features (ACFs) developed to capture the changes of neuromotor coordination that happens as a result of psychomotor slowing, a necessary feature of Major Depressive Disorder. The ACFs derived from the vocal tract variables (TVs) are used to train a dilated Convolutional Neural Network based depression classification model to obtain segment-level predictions. Then, we propose a Recurrent Neural Network based approach to obtain session-level predictions from segment-level predictions. We show that strengths of the segment-wise classifier are amplified when a session-wise classifier is trained on embeddings obtained from it. The model trained on ACFs derived from TVs show relative improvement of 27.47% in Unweighted Average Recall (UAR) at the session-level classification task, compared to the ACFs derived from Mel Frequency Cepstral Coefficients (MFCCs).

Comments:	5 pages, submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.06739
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
Cite as:	arXiv:2104.04195 [eess.AS]
	(or arXiv:2104.04195v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2104.04195

Submission history

From: Nadee Seneviratne [view email]
[v1] Fri, 9 Apr 2021 05:10:08 UTC (590 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators