Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Lin, Wan; Chen, Junhui; Wang, Tianhao; Zhou, Zhenyu; Li, Lantian; Wang, Dong

Computer Science > Sound

arXiv:2410.16428 (cs)

[Submitted on 21 Oct 2024 (v1), last revised 3 Jul 2025 (this version, v3)]

Title:Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Authors:Wan Lin, Junhui Chen, Tianhao Wang, Zhenyu Zhou, Lantian Li, Dong Wang

View PDF HTML (experimental)

Abstract:Modern speaker verification systems primarily rely on speaker embeddings, followed by verification based on cosine similarity between the embedding vectors of the enrollment and test utterances. While effective, these methods struggle with multi-talker speech due to the unidentifiability of embedding vectors. In this paper, we propose Neural Scoring (NS), a refreshed end-to-end framework that directly estimates verification posterior probabilities without relying on test-side embeddings, making it more robust to complex conditions, e.g., with multiple talkers. To make the training of such an end-to-end model more efficient, we introduce a large-scale trial e2e training (LtE2E) strategy, where each test utterance pairs with a set of enrolled speakers, thus enabling the processing of large-scale verification trials per batch. Experiments on the VoxCeleb dataset demonstrate that NS consistently outperforms both the baseline and competitive methods across various conditions, achieving an overall 70.36% reduction in EER compared to the baseline.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2410.16428 [cs.SD]
	(or arXiv:2410.16428v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2410.16428

Submission history

From: Lantian Li Mr. [view email]
[v1] Mon, 21 Oct 2024 18:48:54 UTC (391 KB)
[v2] Tue, 3 Jun 2025 23:35:07 UTC (477 KB)
[v3] Thu, 3 Jul 2025 16:36:28 UTC (487 KB)

Computer Science > Sound

Title:Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators