Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Guo, Aoqi; Qian, Sichong; Li, Baoxiang; Gao, Dazhi

Computer Science > Sound

arXiv:2308.15990 (cs)

[Submitted on 30 Aug 2023 (v1), last revised 7 Sep 2023 (this version, v2)]

Title:Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Authors:Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

View PDF

Abstract:Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. Subsequently, in the frequency domain, the self-attention mechanism is employed to enhance the model's ability to process frequency-specific details. By design, our model circumvents the influence of pre-separation modules, delivering performance in a more comprehensive end-to-end manner. Experimental results reveal that our model not only outperforms contemporary leading neural beamforming algorithms in separation performance but also achieves this with a significant reduction in parameter count.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2308.15990 [cs.SD]
	(or arXiv:2308.15990v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2308.15990

Submission history

From: Aoqi Guo [view email]
[v1] Wed, 30 Aug 2023 12:22:58 UTC (8,070 KB)
[v2] Thu, 7 Sep 2023 06:50:18 UTC (8,070 KB)

Computer Science > Sound

Title:Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators