TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Aroudi, Ali; Uhlich, Stefan; Font, Marc Ferras

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2110.04047 (eess)

[Submitted on 8 Oct 2021 (v1), last revised 22 Aug 2022 (this version, v2)]

Title:TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Authors:Ali Aroudi, Stefan Uhlich, Marc Ferras Font

View PDF

Abstract:In recent years, many deep learning techniques for single-channel sound source separation have been proposed using recurrent, convolutional and transformer networks. When multiple microphones are available, spatial diversity between speakers and background noise in addition to spectro-temporal diversity can be exploited by using multi-channel filters for sound source separation. Aiming at end-to-end multi-channel source separation, in this paper we propose a transformer-recurrent-U network (TRUNet), which directly estimates multi-channel filters from multi-channel input spectra. TRUNet consists of a spatial processing network with an attention mechanism across microphone channels aiming at capturing the spatial diversity, and a spectro-temporal processing network aiming at capturing spectral and temporal diversities. In addition to multi-channel filters, we also consider estimating single-channel filters from multi-channel input spectra using TRUNet. We train the network on a large reverberant dataset using a combined compressed mean-squared error loss function, which further improves the sound separation performance. We evaluate the network on a realistic and challenging reverberant dataset, generated from measured room impulse responses of an actual microphone array. The experimental results on realistic reverberant sound source separation show that the proposed TRUNet outperforms state-of-the-art single-channel and multi-channel source separation methods.

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2110.04047 [eess.AS]
	(or arXiv:2110.04047v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2110.04047

Submission history

From: Ali Aroudi [view email]
[v1] Fri, 8 Oct 2021 11:45:56 UTC (713 KB)
[v2] Mon, 22 Aug 2022 06:53:03 UTC (723 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators