Skip to main content

Showing 1–3 of 3 results for author: Kuzmin, N

Searching in archive cs. Search in all archives.
.
  1. NTU-NPU System for Voice Privacy 2024 Challenge

    Authors: Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng Siong Chng

    Abstract: In this work, we describe our submissions for the Voice Privacy Challenge 2024. Rather than proposing a novel speech anonymization system, we enhance the provided baselines to meet all required conditions and improve evaluated metrics. Specifically, we implement emotion embedding and experiment with WavLM and ECAPA2 speaker embedders for the B3 baseline. Additionally, we compare different speaker… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: System description for VPC 2024

    Journal ref: 2024 Challenge. Proc. 4th Symposium on Security and Privacy in Speech Communication, 72-79

  2. arXiv:2302.09523  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Probabilistic Back-ends for Online Speaker Recognition and Clustering

    Authors: Alexey Sholokhov, Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng

    Abstract: This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an ex… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: Accepted to ICASSP 2023

  3. arXiv:2202.13826  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Magnitude-aware Probabilistic Speaker Embeddings

    Authors: Nikita Kuzmin, Igor Fedorov, Alexey Sholokhov

    Abstract: Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality o… ▽ More

    Submitted 23 October, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Accepted to Odyssey 2022: The Speaker and Language Recognition Workshop, camera-ready version