Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech

Mendonça, John; Solera-Ureña, Rubén; Abad, Alberto; Trancoso, Isabel

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2107.00112 (eess)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 30 Jun 2021]

Title:Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech

Authors:John Mendonça, Rubén Solera-Ureña, Alberto Abad, Isabel Trancoso

View PDF

Abstract:The ComParE 2021 COVID-19 Speech Sub-challenge provides a test-bed for the evaluation of automatic detectors of COVID-19 from speech. Such models can be of value by providing test triaging capabilities to health authorities, working alongside traditional testing methods. Herein, we leverage the usage of pre-trained, problem agnostic, speech representations and evaluate their use for this task. We compare the obtained results against a CNN architecture trained from scratch and traditional frequency-domain representations. We also evaluate the usage of Self-Attention Pooling as an utterance-level information aggregation method. Experimental results demonstrate that models trained on features extracted from self-supervised models perform similarly or outperform fully-supervised models and models based on handcrafted features. Our best model improves the Unweighted Average Recall (UAR) from 69.0\% to 72.3\% on a development set comprised of only full-band examples and achieves 64.4\% on the test set. Furthermore, we study where the network is attending, attempting to draw some conclusions regarding its explainability. In this relatively small dataset, we find the network attends especially to vowels and aspirates.

Comments:	Submitted to Interspeech2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2107.00112 [eess.AS]
	(or arXiv:2107.00112v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2107.00112

Submission history

From: John Mendonca [view email]
[v1] Wed, 30 Jun 2021 21:35:28 UTC (1,592 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators