VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Yoshioka, Takuya; Wang, Xiaofei; Wang, Dongmei; Tang, Min; Zhu, Zirun; Chen, Zhuo; Kanda, Naoyuki

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2110.05745 (eess)

[Submitted on 12 Oct 2021 (v1), last revised 26 Oct 2021 (this version, v2)]

Title:VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Authors:Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

View PDF

Abstract:Continuous speech separation using a microphone array was shown to be promising in dealing with the speech overlap problem in natural conversation transcription. This paper proposes VarArray, an array-geometry-agnostic speech separation neural network model. The proposed model is applicable to any number of microphones without retraining while leveraging the nonlinear correlation between the input channels. The proposed method adapts different elements that were proposed before separately, including transform-average-concatenate, conformer speech separation, and inter-channel phase differences, and combines them in an efficient and cohesive way. Large-scale evaluation was performed with two real meeting transcription tasks by using a fully developed transcription system requiring no prior knowledge such as reference segmentations, which allowed us to measure the impact that the continuous speech separation system could have in realistic settings. The proposed model outperformed a previous approach to array-geometry-agnostic modeling for all of the geometry configurations considered, achieving asclite-based speaker-agnostic word error rates of 17.5% and 20.4% for the AMI development and evaluation sets, respectively, in the end-to-end setting using no ground-truth segmentations.

Comments:	5 pages, 1 figure, 3 tables, submitted to ICASSP 2022; updated reference information of [33]
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2110.05745 [eess.AS]
	(or arXiv:2110.05745v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2110.05745

Submission history

From: Takuya Yoshioka [view email]
[v1] Tue, 12 Oct 2021 05:31:46 UTC (76 KB)
[v2] Tue, 26 Oct 2021 06:57:48 UTC (77 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators