RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Kim, June-Woo; Toikkanen, Miika; Bae, Sangmin; Kim, Minseok; Jung, Ho-Young

Computer Science > Sound

arXiv:2405.02996 (cs)

[Submitted on 5 May 2024]

Title:RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Authors:June-Woo Kim, Miika Toikkanen, Sangmin Bae, Minseok Kim, Ho-Young Jung

View PDF HTML (experimental)

Abstract:Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.

Comments:	Accepted EMBC 2024
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2405.02996 [cs.SD]
	(or arXiv:2405.02996v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2405.02996

Submission history

From: June-Woo Kim [view email]
[v1] Sun, 5 May 2024 16:45:46 UTC (854 KB)

Computer Science > Sound

Title:RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators