Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Shi, Runwu; Li, Kai; Li, Chang; Wang, Jiang; Tan, Sihan; Nakadai, Kazuhiro

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2509.24395 (eess)

[Submitted on 29 Sep 2025]

Title:Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Authors:Runwu Shi, Kai Li, Chang Li, Jiang Wang, Sihan Tan, Kazuhiro Nakadai

View PDF HTML (experimental)

Abstract:Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect real-world conditions. Instead, we revisit the source-model paradigm, training a diffusion generative model solely on anechoic speech and formulating separation as a diffusion inverse problem. However, unconditional diffusion models lack speaker-level conditioning, they can capture local acoustic structure but produce temporally inconsistent speaker identities in separated sources. To address this limitation, we propose Speaker-Embedding guidance that, during the reverse diffusion process, maintains speaker coherence within each separated track while driving embeddings of different speakers further apart. In addition, we propose a new separation-oriented solver tailored for speech separation, and both strategies effectively enhance performance on the challenging task of unsupervised source-model-based speech separation, as confirmed by extensive experimental results. Audio samples and code are available at this https URL.

Comments:	5 pages, 2 figures, submitted to ICASSP 2026
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2509.24395 [eess.AS]
	(or arXiv:2509.24395v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2509.24395

Submission history

From: Runwu Shi [view email]
[v1] Mon, 29 Sep 2025 07:42:54 UTC (1,295 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators