Fused Audio Instance and Representation for Respiratory Disease Detection

Truong, Tuan; Lenga, Matthias; Serrurier, Antoine; Mohammadi, Sadegh

Computer Science > Sound

arXiv:2204.10581 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 22 Apr 2022 (v1), last revised 23 Nov 2023 (this version, v4)]

Title:Fused Audio Instance and Representation for Respiratory Disease Detection

Authors:Tuan Truong, Matthias Lenga, Antoine Serrurier, Sadegh Mohammadi

View PDF

Abstract:Audio-based classification techniques on body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of cough as the main biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on COVID-19 have shown that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation.

Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.10581 [cs.SD]
	(or arXiv:2204.10581v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2204.10581

Submission history

From: Matthias Lenga [view email]
[v1] Fri, 22 Apr 2022 09:01:29 UTC (379 KB)
[v2] Mon, 10 Apr 2023 08:36:17 UTC (379 KB)
[v3] Tue, 7 Nov 2023 09:14:21 UTC (597 KB)
[v4] Thu, 23 Nov 2023 09:15:48 UTC (597 KB)

Computer Science > Sound

Title:Fused Audio Instance and Representation for Respiratory Disease Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Fused Audio Instance and Representation for Respiratory Disease Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators