Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Yu, Dong; Kolbæk, Morten; Tan, Zheng-Hua; Jensen, Jesper

Computer Science > Computation and Language

arXiv:1607.00325 (cs)

[Submitted on 1 Jul 2016 (v1), last revised 3 Jan 2017 (this version, v2)]

Title:Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Authors:Dong Yu, Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen

View PDF

Abstract:We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from most of the prior arts that treat speech separation as a multi-class regression problem and the deep clustering technique that considers it a segmentation (or clustering) problem, our model optimizes for the separation regression error, ignoring the order of mixing sources. This strategy cleverly solves the long-lasting label permutation problem that has prevented progress on deep learning based techniques for speech separation. Experiments on the equal-energy mixing setup of a Danish corpus confirms the effectiveness of PIT. We believe improvements built upon PIT can eventually solve the cocktail-party problem and enable real-world adoption of, e.g., automatic meeting transcription and multi-party human-computer interaction, where overlapping speech is common.

Comments:	5 pages
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1607.00325 [cs.CL]
	(or arXiv:1607.00325v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1607.00325

Submission history

From: Morten Kolbæk [view email]
[v1] Fri, 1 Jul 2016 17:34:16 UTC (226 KB)
[v2] Tue, 3 Jan 2017 19:57:37 UTC (131 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2016-07

Change to browse by:

cs
cs.LG
cs.SD
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen

export BibTeX citation

Computer Science > Computation and Language

Title:Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators