Showing 1–2 of 2 results for author: Prieto, S

Search v0.5.6 released 2020-02-24

arXiv:2305.02147 [pdf, other]

eess.AS cs.HC

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Authors: Iván López-Espejo, Santi Prieto, Alfonso Ortega, Eduardo Lleida

Abstract: Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer… ▽ More Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e.g., shouted and whispered) speech. To address this issue, in this paper, we propose a new speaker embedding compensation method based on a minimum mean square error (MMSE) estimator. This method models the joint distribution of the vocal effort transfer vector and non-neutrally-phonated embedding spaces and operates in a principal component analysis domain to cope with non-neutrally-phonated speech data scarcity. Experiments are carried out using a cutting-edge speaker verification system integrating a powerful self-supervised pre-trained model for speech representation. In comparison with a state-of-the-art embedding compensation method, the proposed MMSE estimator yields superior and competitive equal error rate results when tackling shouted and whispered speech, respectively. △ Less

Submitted 4 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.
arXiv:2008.02487 [pdf, other]

eess.AS cs.HC cs.LG cs.SD

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Authors: Santi Prieto, Alfonso Ortega, Iván López-Espejo, Eduardo Lleida

Abstract: The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal… ▽ More The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Search v0.5.6 released 2020-02-24