Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Bao, Yuanyuan; Xu, Yanze; Xu, Na; Yang, Wenjing; Li, Hongfeng; Li, Shicong; Jia, Yongtao; Xiang, Fei; He, Jincheng; Li, Ming

Computer Science > Sound

arXiv:2106.02934 (cs)

[Submitted on 5 Jun 2021]

Title:Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Authors:Yuanyuan Bao, Yanze Xu, Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, Jincheng He, Ming Li

View PDF

Abstract:Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channel dataset, LibriPhone is made by simultaneously replaying pairs of utterances from LibriSpeech by two professional artificial heads and recording by two built-in microphones of the mobile. Then, we propose a lightweight time-frequency domain separation model, LSTM-Former, which is based on the LSTM framework with source-to-noise ratio (SI-SNR) loss. For the experiments on Libri-Phone, we explore the dual-channel LSTMFormer model and a single-channel version by a random single channel of Libri-Phone. Experimental result shows that the dual-channel LSTM-Former outperforms the single-channel LSTMFormer with relative 25% improvement. This work provides a feasible solution for the TSS task on mobile devices, playing back and recording multiple data sources in real application scenarios for getting dual-channel real data can assist the lightweight model to achieve higher performance.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2106.02934 [cs.SD]
	(or arXiv:2106.02934v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2106.02934

Submission history

From: Yuanyuan Bao [view email]
[v1] Sat, 5 Jun 2021 17:19:34 UTC (795 KB)

Computer Science > Sound

Title:Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators