DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Lee, Dongheon; Choi, Jung-Woo

doi:10.1109/LSP.2023.3244428

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2212.07570 (eess)

[Submitted on 15 Dec 2022 (v1), last revised 6 Mar 2023 (this version, v5)]

Title:DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Authors:Dongheon Lee, Jung-Woo Choi

View PDF

Abstract:In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.

Comments:	5 pages, 2 figures, 3 tables. This article has been published by IEEE Signal Processing Letters. This version is the authors' version and may vary from the final publication in details
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2212.07570 [eess.AS]
	(or arXiv:2212.07570v5 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2212.07570
Journal reference:	IEEE Signal Processing Letters, Vol. 30, pp. 155-159, 2023
Related DOI:	https://doi.org/10.1109/LSP.2023.3244428

Submission history

From: Dongheon Lee [view email]
[v1] Thu, 15 Dec 2022 01:03:18 UTC (1,413 KB)
[v2] Wed, 18 Jan 2023 12:10:23 UTC (751 KB)
[v3] Thu, 19 Jan 2023 01:51:08 UTC (751 KB)
[v4] Fri, 20 Jan 2023 01:25:20 UTC (751 KB)
[v5] Mon, 6 Mar 2023 06:20:32 UTC (753 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators