Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

Wang, Zhong-Qiu

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2402.09313 (eess)

[Submitted on 14 Feb 2024 (v1), last revised 17 Jun 2024 (this version, v2)]

Title:Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

Authors:Zhong-Qiu Wang

View PDF HTML (experimental)

Abstract:We propose mixture to mixture (M2M) training, a weakly-supervised neural speech separation algorithm that leverages close-talk mixtures as a weak supervision for training discriminative models to separate far-field mixtures. Our idea is that, for a target speaker, its close-talk mixture has a much higher signal-to-noise ratio (SNR) of the target speaker than any far-field mixtures, and hence could be utilized to design a weak supervision for separation. To realize this, at each training step we feed a far-field mixture to a deep neural network (DNN) to produce an intermediate estimate for each speaker, and, for each of considered close-talk and far-field microphones, we linearly filter the DNN estimates and optimize a loss so that the filtered estimates of all the speakers can sum up to the mixture captured by each of the considered microphones. Evaluation results on a 2-speaker separation task in simulated reverberant conditions show that M2M can effectively leverage close-talk mixtures as a weak supervision for separating far-field mixtures.

Comments:	in IEEE Signal Processing Letters
Subjects:	Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2402.09313 [eess.AS]
	(or arXiv:2402.09313v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2402.09313

Submission history

From: Zhong-Qiu Wang [view email]
[v1] Wed, 14 Feb 2024 17:03:04 UTC (209 KB)
[v2] Mon, 17 Jun 2024 17:12:59 UTC (209 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators