A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

Bai, Jisheng; Chen, Jianfeng; Wang, Mou; Ayub, Muhammad Saad

doi:10.1109/TCDS.2022.3222350

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2203.08350 (eess)

[Submitted on 16 Mar 2022 (v1), last revised 21 Nov 2023 (this version, v2)]

Title:A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

Authors:Jisheng Bai, Jianfeng Chen, Mou Wang, Muhammad Saad Ayub

View PDF

Abstract:Environmental sound recognition (ESR) is an emerging research topic in audio pattern recognition. Many tasks are presented to resort to computational models for ESR in real-life applications. However, current models are usually designed for individual tasks, and are not robust and applicable to other tasks. Cross-task models, which promote unified knowledge modeling across various tasks, have not been thoroughly investigated. In this article, we propose a cross-task model for three different tasks of ESR: 1) acoustic scene classification; 2) urban sound tagging; and 3) anomalous sound detection. An architecture named SE-Trans is presented that uses attention mechanism-based Squeeze-and-Excitation and Transformer encoder modules to learn the channelwise relationship and temporal dependencies of the acoustic features. FMix is employed as the data augmentation method that improves the performance of ESR. Evaluations for the three tasks are conducted on the recent databases of detection and classification of acoustic scenes and event challenges. The experimental results show that the proposed cross-task model achieves state-of-the-art performance on all tasks. Further analysis demonstrates that the proposed cross-task model can effectively utilize acoustic knowledge across different ESR tasks.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2203.08350 [eess.AS]
	(or arXiv:2203.08350v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2203.08350
Related DOI:	https://doi.org/10.1109/TCDS.2022.3222350

Submission history

From: Jisheng Bai [view email]
[v1] Wed, 16 Mar 2022 02:07:02 UTC (1,719 KB)
[v2] Tue, 21 Nov 2023 06:40:24 UTC (1,844 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators