Skip to main content

Showing 1–13 of 13 results for author: Larcher, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.02712  [pdf, other

    cs.SD cs.AI cs.NE eess.AS eess.SP

    Automatic Voice Identification after Speech Resynthesis using PPG

    Authors: Thibault Gaudier, Marie Tahon, Anthony Larcher, Yannick Estève

    Abstract: Speech resynthesis is a generic task for which we want to synthesize audio with another audio as input, which finds applications for media monitors and journalists.Among different tasks addressed by speech resynthesis, voice conversion preserves the linguistic information while modifying the identity of the speaker, and speech edition preserves the identity of the speaker but some words are modifi… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Journal ref: Speaker and Language Recognition Workshop - Odyssey, Jun 2024, Qu{é}bec (Canada), Canada

  2. arXiv:2406.03251  [pdf, other

    cs.SD cs.AI eess.AS

    ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

    Authors: Theo Mariotte, Anthony Larcher, Silvio Montresor, Jean-Hugh Thomas

    Abstract: Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually capture the audio signal. Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data. However, it often requires an expli… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024

  3. arXiv:2402.08312  [pdf, other

    eess.AS cs.SD

    Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection

    Authors: Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas

    Abstract: Voice Activity Detection (VAD) and Overlapped Speech Detection (OSD) are key pre-processing tasks for speaker diarization. In the meeting context, it is often easier to capture speech with a distant device. This consideration however leads to severe performance degradation. We study a unified supervised learning framework to solve distant multi-microphone joint VAD and OSD (VAD+OSD). This paper in… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 14 pages, 5 figures, accepted at IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

  4. arXiv:2307.13012  [pdf, other

    cs.SD cs.AI cs.NE eess.AS eess.SP

    Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

    Authors: Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas

    Abstract: Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking inf… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  5. arXiv:2306.04268  [pdf, other

    cs.SD cs.CL eess.AS

    Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features

    Authors: Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas

    Abstract: Speaker diarization is the task of answering Who spoke and when? in an audio stream. Pipeline systems rely on speech segmentation to extract speakers' segments and achieve robust speaker diarization. This paper proposes a common framework to solve three segmentation tasks in the distant speech scenario: Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), and Speaker Change Detection… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Interspeech 2023, international Speech Communication Association (ISCA), Aug 2023, Dublin, Ireland

  6. arXiv:2305.01759  [pdf, other

    eess.AS cs.AI cs.CL

    Evaluation of Speaker Anonymization on Emotional Speech

    Authors: Hubert Nourtel, Pierre Champion, Denis Jouvet, Anthony Larcher, Marie Tahon

    Abstract: Speech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of pr… ▽ More

    Submitted 15 April, 2023; originally announced May 2023.

    Journal ref: Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication (62-66)

  7. arXiv:2208.10497  [pdf, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    Are disentangled representations all you need to build speaker anonymization systems?

    Authors: Pierre Champion, Denis Jouvet, Anthony Larcher

    Abstract: Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns when speech data get collected. Speaker anonymization aims to transform a speech signal to remove the source speaker's identity while leaving the spoken content unchanged. Current methods perform the transformation by relying on content/speaker disentanglement and voice conversion.… ▽ More

    Submitted 13 January, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Journal ref: INTERSPEECH 2022 - Human and Humanizing Speech Technology, Sep 2022, incheon, South Korea

  8. arXiv:2203.09518  [pdf, other

    eess.AS cs.AI cs.CL cs.CR cs.SD

    Privacy-Preserving Speech Representation Learning using Vector Quantization

    Authors: Pierre Champion, Denis Jouvet, Anthony Larcher

    Abstract: With the popularity of virtual assistants (e.g., Siri, Alexa), the use of speech recognition is now becoming more and more widespread.However, speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns.The presented experiments show that the representations extracted by the deep layers of speech recognition networks contain speaker informat… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: Journ{é}es d'{É}tudes sur la Parole - JEP2022, Jun 2022, {Î}le de Noirmoutier, France

  9. arXiv:2110.05431  [pdf, other

    eess.AS cs.CR cs.LG cs.SD

    On the invertibility of a voice privacy system using embedding alignement

    Authors: Pierre Champion, Thomas Thebaud, Gaël Le Lan, Anthony Larcher, Denis Jouvet

    Abstract: This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques. We use Wasserstein-Procrustes (an algorithm initially designed for unsupervised translation) or Procrustes analysis to match two sets of x-vectors, before and after voice anonymization, to mimic this transformation as a rotation function. We compute the optimal rotation and compare t… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Journal ref: ASRU 2021 - IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2021, Cartagena, Colombia

  10. arXiv:2109.11946  [pdf, other

    cs.SD cs.AI cs.CR eess.AS

    Evaluating X-vector-based Speaker Anonymization under White-box Assessment

    Authors: Pierre Champion, Denis Jouvet, Anthony Larcher

    Abstract: In the scenario of the Voice Privacy challenge, anonymization is achieved by converting all utterances from a source speaker to match the same target identity; this identity being randomly selected. In this context, an attacker with maximum knowledge about the anonymization system can not infer the target identity. This article proposed to constrain the target selection to a specific identity, i.e… ▽ More

    Submitted 30 September, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Journal ref: 23rd International Conference on Speech and Computer - SPECOM 2021, Sep 2021, Saint Petersburg, Russia

  11. arXiv:2101.08478  [pdf, other

    eess.AS cs.CR cs.SD

    A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender

    Authors: Pierre Champion, Denis Jouvet, Anthony Larcher

    Abstract: Speech pseudonymization aims at altering a speech signal to map the identifiable personal characteristics of a given speaker to another identity. In other words, it aims to hide the source speaker identity while preserving the intelligibility of the spoken content. This study takes place in the VoicePrivacy 2020 challenge framework, where the baseline system performs pseudonymization by modifying… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Journal ref: The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence, Feb 2021, Nancy, France

  12. arXiv:2011.01108  [pdf, ps, other

    eess.AS

    End-to-end anti-spoofing with RawNet2

    Authors: Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher

    Abstract: Spoofing countermeasures aim to protect automatic speaker verification systems from attempts to manipulate their reliability with the use of spoofed speech signals. While results from the most recent ASVspoof 2019 evaluation show great potential to detect most forms of attack, some continue to evade detection. This paper reports the first application of RawNet2 to anti-spoofing. RawNet2 ingests ra… ▽ More

    Submitted 16 December, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted to ICASSP 2021

  13. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages