U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Li, Yi; Sun, Yang; Naqvi, Syed Mohsen

doi:10.1109/TASLP.2023.3265839

Computer Science > Sound

arXiv:2112.06052 (cs)

[Submitted on 11 Dec 2021]

Title:U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Authors:Yi Li, Yang Sun, Syed Mohsen Naqvi

View PDF

Abstract:The state-of-the-art speech enhancement has limited performance in speech estimation accuracy. Recently, in deep learning, the Transformer shows the potential to exploit the long-range dependency in speech by self-attention. Therefore, it is introduced in speech enhancement to improve the speech estimation accuracy from a noise mixture. However, to address the computational cost issue in Transformer with self-attention, the axial attention is the option i.e., to split a 2D attention into two 1D attentions. Inspired by the axial attention, in the proposed method we calculate the attention map along both time- and frequency-axis to generate time and frequency sub-attention maps. Moreover, different from the axial attention, the proposed method provides two parallel multi-head attentions for time- and frequency-axis. Furthermore, it is proven in the literature that the lower frequency-band in speech, generally, contains more desired information than the higher frequency-band, in a noise mixture. Therefore, the frequency-band aware attention is proposed i.e., high frequency-band attention (HFA), and low frequency-band attention (LFA). The U-shaped Transformer is also first time introduced in the proposed method to further improve the speech estimation accuracy. The extensive evaluations over four public datasets, confirm the efficacy of the proposed method.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2112.06052 [cs.SD]
	(or arXiv:2112.06052v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2112.06052
Journal reference:	IEEE/ACM Transactions on Audio, Speech, and Language Processing ( Volume: 31), 2023
Related DOI:	https://doi.org/10.1109/TASLP.2023.3265839

Submission history

From: Yi Li [view email]
[v1] Sat, 11 Dec 2021 19:18:34 UTC (18,888 KB)

Computer Science > Sound

Title:U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators