Guided Speaker Embedding

Horiguchi, Shota; Moriya, Takafumi; Ando, Atsushi; Ashihara, Takanori; Sato, Hiroshi; Tawara, Naohiro; Delcroix, Marc

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2410.12182 (eess)

[Submitted on 16 Oct 2024 (v1), last revised 1 Jan 2025 (this version, v2)]

Title:Guided Speaker Embedding

Authors:Shota Horiguchi, Takafumi Moriya, Atsushi Ando, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix

View PDF HTML (experimental)

Abstract:This paper proposes a guided speaker embedding extraction system, which extracts speaker embeddings of the target speaker using speech activities of target and interference speakers as clues. Several methods for long-form overlapped multi-speaker audio processing are typically two-staged: i) segment-level processing and ii) inter-segment speaker matching. Speaker embeddings are often used for the latter purpose. Typical speaker embedding extraction approaches only use single-speaker intervals to avoid corrupting the embeddings with speech from interference speakers. However, this often makes speaker embeddings impossible to extract because sufficiently long non-overlapping intervals are not always available. In this paper, we propose using speaker activities as clues to extract the embedding of the speaker-of-interest directly from overlapping speech. Specifically, we concatenate the activity of target and non-target speakers to acoustic features before being fed to the model. We also condition the attention weights used for pooling so that the attention weights of the intervals in which the target speaker is inactive are zero. The effectiveness of the proposed method is demonstrated in speaker verification and speaker diarization.

Comments:	Accepted to ICASSP 2025
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2410.12182 [eess.AS]
	(or arXiv:2410.12182v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2410.12182

Submission history

From: Shota Horiguchi [view email]
[v1] Wed, 16 Oct 2024 03:01:05 UTC (997 KB)
[v2] Wed, 1 Jan 2025 11:11:53 UTC (997 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Guided Speaker Embedding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Guided Speaker Embedding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators