Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Gu, Rongzhi; Zhang, Shi-Xiong; Chen, Lianwu; Xu, Yong; Yu, Meng; Su, Dan; Zou, Yuexian; Yu, Dong

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2003.03927 (eess)

[Submitted on 9 Mar 2020 (v1), last revised 13 Mar 2020 (this version, v2)]

Title:Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Authors:Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

View PDF

Abstract:Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering. These filters are implemented by a 2d convolution (conv2d) layer and their parameters are optimized using a speech separation objective function in a purely data-driven fashion. Furthermore, inspired by the IPD formulation, we design a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional sources. Evaluation results on simulated multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed ICD based MCSS model improves the overall signal-to-distortion ratio by 10.4% over the IPD based MCSS model.

Comments:	accepted in ICASSP 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2003.03927 [eess.AS]
	(or arXiv:2003.03927v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2003.03927

Submission history

From: Rongzhi Gu [view email]
[v1] Mon, 9 Mar 2020 05:28:20 UTC (2,386 KB)
[v2] Fri, 13 Mar 2020 04:25:21 UTC (2,621 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators