TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction

Hsieh, Tsun-An; Kim, Minje

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2507.14044 (eess)

[Submitted on 18 Jul 2025]

Title:TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction

Authors:Tsun-An Hsieh, Minje Kim

View PDF HTML (experimental)

Abstract:State-of-the-art target speaker extraction (TSE) systems are typically designed to generalize to any given mixing environment, necessitating a model with a large enough capacity as a generalist. Personalized speech enhancement could be a specialized solution that adapts to single-user scenarios, but it overlooks the practical need for customization in cases where only a small number of talkers are involved, e.g., TSE for a specific family. We address this gap with the proposed concept, talker group-informed familiarization (TGIF) of TSE, where the TSE system specializes in a particular group of users, which is challenging due to the inherent absence of a clean speech target. To this end, we employ a knowledge distillation approach, where a group-specific student model learns from the pseudo-clean targets generated by a large teacher model. This tailors the student model to effectively extract the target speaker from the particular talker group while maintaining computational efficiency. Experimental results demonstrate that our approach outperforms the baseline generic models by adapting to the unique speech characteristics of a given speaker group. Our newly proposed TGIF concept underscores the potential of developing specialized solutions for diverse and real-world applications, such as on-device TSE on a family-owned device.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2507.14044 [eess.AS]
	(or arXiv:2507.14044v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2507.14044
Journal reference:	IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Submission history

From: Tsun-An Hsieh [view email]
[v1] Fri, 18 Jul 2025 16:12:22 UTC (395 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:TGIF: Talker Group-Informed Familiarization of Target Speaker Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators