Skip to main content

Showing 1–28 of 28 results for author: Tomashenko, N

Searching in archive eess. Search in all archives.
.
  1. The First VoicePrivacy Attacker Challenge

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Emmanuel Vincent, Junichi Yamagishi

    Abstract: The First VoicePrivacy Attacker Challenge is an ICASSP 2025 SP Grand Challenge which focuses on evaluating attacker systems against a set of voice anonymization systems submitted to the VoicePrivacy 2024 Challenge. Training, development, and evaluation datasets were provided along with a baseline attacker. Participants developed their attacker systems in the form of automatic speaker verification… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Journal ref: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-2

  2. Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization

    Authors: Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi

    Abstract: In this paper, we investigate the impact of speech temporal dynamics in application to automatic speaker verification and speaker voice anonymization tasks. We propose several metrics to perform automatic speaker verification based only on phoneme durations. Experimental results demonstrate that phoneme durations leak some speaker information and can reveal speaker identity from both original and… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Comments: Accepted at ICASSP 2025

    Journal ref: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025, pp. 1-5

  3. arXiv:2410.07428  [pdf, other

    eess.AS cs.CL cs.CR

    The First VoicePrivacy Attacker Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Emmanuel Vincent, Junichi Yamagishi

    Abstract: The First VoicePrivacy Attacker Challenge is a new kind of challenge organized as part of the VoicePrivacy initiative and supported by ICASSP 2025 as the SP Grand Challenge It focuses on developing attacker systems against voice anonymization, which will be evaluated against a set of anonymization systems submitted to the VoicePrivacy 2024 Challenge. Training, development, and evaluation datasets… ▽ More

    Submitted 21 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

  4. arXiv:2408.05928  [pdf, other

    cs.SD eess.AS

    Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

    Authors: Xiaoxiao Miao, Yuxiang Zhang, Xin Wang, Natalia Tomashenko, Donny Cheng Lock Soh, Ian Mcloughlin

    Abstract: A general disentanglement-based speaker anonymization system typically separates speech into content, speaker, and prosody features using individual encoders. This paper explores how to adapt such a system when a new speech attribute, for example, emotion, needs to be preserved to a greater extent. While existing systems are good at anonymizing speaker embeddings, they are not designed to preserve… ▽ More

    Submitted 22 April, 2025; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by computer speech and language

  5. arXiv:2407.11516  [pdf, other

    eess.AS

    The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation

    Authors: Michele Panariello, Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas Evans, Emmanuel Vincent, Junichi Yamagishi

    Abstract: The VoicePrivacy Challenge promotes the development of voice anonymisation solutions for speech technology. In this paper we present a systematic overview and analysis of the second edition held in 2022. We describe the voice anonymisation task and datasets used for system development and evaluation, present the different attack models used for evaluation, and the associated objective and subjecti… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted at IEEE/ACM Transactions on Audio, Speech, and Language Processing

  6. arXiv:2404.02677  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2024 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

    Abstract: The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Part… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 19 pages, https://www.voiceprivacychallenge.org/. arXiv admin note: substantial text overlap with arXiv:2203.12468

  7. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  8. arXiv:2305.18823  [pdf, other

    cs.SD eess.AS

    Speaker anonymization using orthogonal Householder neural network

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker… ▽ More

    Submitted 12 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  9. arXiv:2302.10790  [pdf, other

    eess.AS cs.LG cs.SD

    Federated Learning for ASR based on Wav2vec 2.0

    Authors: Tuan Nguyen, Salima Mdhaffar, Natalia Tomashenko, Jean-François Bonastre, Yannick Estève

    Abstract: This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. Carried out on the well-known TED-LIUM 3 dataset, our experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set, without sharing any data from the different users. We al… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 5 pages, accepted in ICASSP 2023

  10. arXiv:2205.07123  [pdf, other

    cs.CL cs.CR eess.AS

    The VoicePrivacy 2020 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

    Abstract: The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used f… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.12468

  11. arXiv:2204.01397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

    Authors: Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

    Abstract: Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to und… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

  12. arXiv:2203.12468  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2022 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

    Abstract: For new participants - Executive summary: (1) The task is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content, paralinguistic attributes, intelligibility and naturalness. (2) Training, development and evaluation datasets are provided in addition to 3 different baseline anonymization systems, evaluation scripts, and… ▽ More

    Submitted 28 September, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: the file is unchanged; minor correction in metadata

  13. arXiv:2202.13097  [pdf, ps, other

    cs.SD eess.AS

    Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is la… ▽ More

    Submitted 27 April, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

  14. arXiv:2111.04194  [pdf, other

    cs.CL cs.SD eess.AS

    Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

    Authors: Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

    Abstract: The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific speaker data, can be built. A question that naturally arises is whether the dissemination of p… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  15. arXiv:2111.03777  [pdf, other

    cs.CL cs.CR cs.SD eess.AS

    Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

    Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

    Abstract: This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an a… ▽ More

    Submitted 14 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976

  16. arXiv:2109.00648  [pdf, other

    cs.CL cs.SD eess.AS

    The VoicePrivacy 2020 Challenge: Results and findings

    Authors: Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche

    Abstract: This paper presents the results and analyses stemming from the first VoicePrivacy 2020 Challenge which focuses on developing anonymization solutions for speech technology. We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results. In particular, we describe the voice anonymization task and datasets used for system development and evaluati… ▽ More

    Submitted 26 September, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Submitted to the Special Issue on Voice Privacy (Computer Speech and Language Journal - Elsevier); under review

  17. arXiv:2109.00281  [pdf, other

    cs.CR cs.SD eess.AS

    Benchmarking and challenges in security and privacy for voice biometrics

    Authors: Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi

    Abstract: For many decades, research in speech technologies has focused upon improving reliability. With this now meeting user expectations for a range of diverse applications, speech technology is today omni-present. As result, a focus on security and privacy has now come to the fore. Here, the research effort is in its relative infancy and progress calls for greater, multidisciplinary collaboration with s… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Submitted to the symposium of the ISCA Security & Privacy in Speech Communications (SPSC) special interest group

  18. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  19. arXiv:2011.01130  [pdf, other

    eess.AS cs.CL

    Speaker anonymisation using the McAdams coefficient

    Authors: Jose Patino, Natalia Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans

    Abstract: Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness. This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data, is based upon well-known signal processing… ▽ More

    Submitted 1 September, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at INTERSPEECH 2021

  20. arXiv:2008.13144  [pdf, other

    eess.AS cs.CR

    Speech Pseudonymisation Assessment Using Voice Similarity Matrices

    Authors: Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia Tomashenko, Andreas Nautsch, Nicholas Evans

    Abstract: The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications. These are essential since speech signals convey a wealth of rich, personal and potentially sensitive information. Anonymisation, the focus of the recent VoicePrivacy initiative, is one strategy to protect speaker identity information. Pseudony… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

    Comments: Interspeech 2020

  21. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  22. The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment

    Authors: Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul-Gauthier Noe, Jean-Francois Bonastre, Massimiliano Todisco, Nicholas Evans

    Abstract: Mounting privacy legislation calls for the preservation of privacy in speech technology, though solutions are gravely lacking. While evaluation campaigns are long-proven tools to drive progress, the need to consider a privacy adversary implies that traditional approaches to evaluation must be adapted to the assessment of privacy and privacy preservation solutions. This paper presents the first ste… ▽ More

    Submitted 20 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: submitted to Interspeech 2020

    Journal ref: Proc Interspeech 2020

  23. arXiv:2005.08601  [pdf, other

    eess.AS cs.CL

    Design Choices for X-vector Based Speaker Anonymization

    Authors: Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi

    Abstract: The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender sel… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  24. arXiv:2003.06894  [pdf, other

    eess.AS cs.CL cs.SD

    Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

    Authors: Natalia Tomashenko, Yuri Khokhlov, Yannick Esteve

    Abstract: In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC featu… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: 36 pages; originally was submitted to CSL in February 2017

  25. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

    Authors: Natalia Tomashenko, Christian Raymond, Antoine Caubriere, Renato De Mori, Yannick Esteve

    Abstract: This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted for ICASSP 2020 (Submitted: October 21, 2019)

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  26. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  27. Recent Advances in End-to-End Spoken Language Understanding

    Authors: Natalia Tomashenko, Antoine Caubriere, Yannick Esteve, Antoine Laurent, Emmanuel Morin

    Abstract: This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the model performance, we explore various techniques inclu… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Journal ref: Statistical Language and Speech Processing. SLSP 2019

  28. arXiv:1906.07601  [pdf, other

    cs.CL cs.SD eess.AS

    Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability

    Authors: Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève

    Abstract: We present an end-to-end approach to extract semantic concepts directly from the speech audio signal. To overcome the lack of data available for this spoken language understanding approach, we investigate the use of a transfer learning strategy based on the principles of curriculum learning. This approach allows us to exploit out-of-domain data that can help to prepare a fully neural architecture.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to the INTERSPEECH 2019 conference. Submitted on March 29, 2019 (Paper submission deadline)