Online Speaker Diarization with Relation Network

Li, Xiang; Zhao, Yucheng; Luo, Chong; Zeng, Wenjun

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2009.08162 (eess)

This paper has been withdrawn by Yucheng Zhao

[Submitted on 17 Sep 2020 (v1), last revised 19 Sep 2020 (this version, v2)]

Title:Online Speaker Diarization with Relation Network

Authors:Xiang Li, Yucheng Zhao, Chong Luo, Wenjun Zeng

No PDF available, click to view other formats

Abstract:In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements voice-activity-detection (VAD), embedding extraction, and speaker identity association using a single deep neural network. The most striking feature of RenoSD is that it adopts a meta-learning strategy for speaker identity association. In particular, the relation network learns to learn a deep distance metric in a data-driven way and it can determine through a simple forward pass whether two given segments belong to the same speaker. As such, RenoSD can be performed in an online manner with low latency. Experimental results on AMI and CALLHOME datasets show that the proposed RenoSD system achieves consistent improvements over the state-of-the-art x-vector baseline. Compared with an existing online diarization system named UIS-RNN, RenoSD achieves a better performance using much fewer training data and at a lower time complexity.

Comments:	We find potential incorrectness in our experimental results which may lead to a wrong conclusion. We decide to rerun the experiments to check our experimental results and temporarily withdraw this paper
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2009.08162 [eess.AS]
	(or arXiv:2009.08162v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2009.08162

Submission history

From: Yucheng Zhao [view email]
[v1] Thu, 17 Sep 2020 09:11:49 UTC (175 KB)
[v2] Sat, 19 Sep 2020 01:33:39 UTC (1 KB) (withdrawn)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Online Speaker Diarization with Relation Network

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Online Speaker Diarization with Relation Network

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators