Skip to main content

Showing 1–12 of 12 results for author: Rohdin, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15320  [pdf, ps, other

    eess.AS cs.SD

    Analysis of ABC Frontend Audio Systems for the NIST-SRE24

    Authors: Sara Barahona, Anna Silnova, Ladislav Mošner, Junyi Peng, Oldřich Plchot, Johan Rohdin, Lin Zhang, Jiangyu Han, Petr Palka, Federico Landini, Lukáš Burget, Themos Stafylakis, Sandro Cumani, Dominik Boboš, Miroslav Hlavaček, Martin Kodovsky, Tomáš Pavlíček

    Abstract: We present a comprehensive analysis of the embedding extractors (frontends) developed by the ABC team for the audio track of NIST SRE 2024. We follow the two scenarios imposed by NIST: using only a provided set of telephone recordings for training (fixed) or adding publicly available data (open condition). Under these constraints, we develop the best possible speaker embedding extractors for the p… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  2. arXiv:2409.09408  [pdf, other

    eess.AS cs.SD

    Leveraging Self-Supervised Learning for Speaker Diarization

    Authors: Jiangyu Han, Federico Landini, Johan Rohdin, Anna Silnova, Mireia Diez, Lukas Burget

    Abstract: End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM have shown promising performance on several downstream tasks, but their application on speaker diarization is somehow limited. In this work, we explore using WavLM to alleviate the problem of data scarci… ▽ More

    Submitted 21 October, 2024; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025; New results are updated but conclusions are exactly the same as the original one

  3. arXiv:2408.11152  [pdf, other

    cs.SD eess.AS

    BUT Systems and Analyses for the ASVspoof 5 Challenge

    Authors: Johan Rohdin, Lin Zhang, Oldřich Plchot, Vojtěch Staněk, David Mihola, Junyi Peng, Themos Stafylakis, Dmitriy Beveraki, Anna Silnova, Jan Brukner, Lukáš Burget

    Abstract: This paper describes the BUT submitted systems for the ASVspoof 5 challenge, along with analyses. For the conventional deepfake detection task, we use ResNet18 and self-supervised models for the closed and open conditions, respectively. In addition, we analyze and visualize different combinations of speaker information and spoofing information as label schemes for training. For spoofing-robust aut… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages, ASVspoof 5 Workshop (Interspeech2024 Satellite)

  4. arXiv:2309.08377  [pdf, other

    eess.AS cs.CL cs.SD

    DiaCorrect: Error Correction Back-end For Speaker Diarization

    Authors: Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

    Abstract: In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way. This method is inspired by error correction techniques in automatic speech recognition. Our model consists of two parallel convolutional encoders and a transform-based decoder. By exploiting the interactions between the input recording and the initia… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  5. arXiv:2104.02571  [pdf, ps, other

    eess.AS cs.CV

    Speaker embeddings by modeling channel-wise correlations

    Authors: Themos Stafylakis, Johan Rohdin, Lukas Burget

    Abstract: Speaker embeddings extracted with deep 2D convolutional neural networks are typically modeled as projections of first and second order statistics of channel-frequency pairs onto a linear layer, using either average or attentive pooling along the time axis. In this paper we examine an alternative pooling method, where pairwise correlations between channels for given frequencies are used as statisti… ▽ More

    Submitted 7 July, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Accepted at Interspeech 2021

  6. arXiv:2010.11718  [pdf, ps, other

    eess.AS cs.SD

    Analysis of the BUT Diarization System for VoxConverse Challenge

    Authors: Federico Landini, Ondřej Glembek, Pavel Matějka, Johan Rohdin, Lukáš Burget, Mireia Diez, Anna Silnova

    Abstract: This paper describes the system developed by the BUT team for the fourth track of the VoxCeleb Speaker Recognition Challenge, focusing on diarization on the VoxConverse dataset. The system consists of signal pre-processing, voice activity detection, speaker embedding extraction, an initial agglomerative hierarchical clustering followed by diarization using a Bayesian hidden Markov model, a reclust… ▽ More

    Submitted 9 February, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: Accepted to ICASSP 2021

  7. arXiv:2004.04096  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Probabilistic embeddings for speaker diarization

    Authors: Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget

    Abstract: Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization. We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix, thus providing a path for the propagation of information about the quality of the speech segment into a PLDA sco… ▽ More

    Submitted 6 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: Awarded: Jack Godfrey Best Student Paper Award, at Odyssey 2020: The Speaker and Language Recognition Workshop, Tokio

  8. arXiv:1907.12908  [pdf, ps, other

    cs.CV cs.AI cs.CR

    Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge

    Authors: Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza'' Černocký

    Abstract: In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge. The primary submission for Physical access (PA) is a fusion of two VGG networks, trained on single and two-channels features. For Logical access (LA), our primary system is a fusion of VGG… ▽ More

    Submitted 13 July, 2019; originally announced July 2019.

  9. arXiv:1904.03486  [pdf, other

    cs.CV

    Self-supervised speaker embeddings

    Authors: Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukas Burget

    Abstract: Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers. In this paper, we explore an alternative training strategy to enable the use of unlabelled utterances in training. We propose to train speaker embedding extractors via reconstructing the frames of a target speech segment, given the in… ▽ More

    Submitted 23 April, 2019; v1 submitted 6 April, 2019; originally announced April 2019.

    Comments: Preprint. Submitted to Interspeech 2019. Updated results compared to first version and minor corrections

  10. arXiv:1811.02331  [pdf, other

    eess.AS cs.SD

    Speaker verification using end-to-end adversarial language adaptation

    Authors: Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukas Burget, Oldrich Plchot

    Abstract: In this paper we investigate the use of adversarial domain adaptation for addressing the problem of language mismatch between speaker recognition corpora. In the context of speaker verification, adversarial domain adaptation methods aim at minimizing certain divergences between the distribution that the utterance-level features follow (i.e. speaker embeddings) when drawn from source and target dom… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  11. arXiv:1811.02066  [pdf, ps, other

    cs.SD cs.CL eess.AS

    How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

    Authors: Hossein Zeinali, Lukas Burget, Johan Rohdin, Themos Stafylakis, Jan Cernocky

    Abstract: Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification. In this paper we aim to facilitate its implementation on a more generic toolkit than Kaldi, which we anticipate to enable further improvements on the method. We examine several tricks in training, such as the effects of normalizing input features and pooled statistics, diff… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

  12. arXiv:1710.02369  [pdf, other

    eess.AS cs.SD

    End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA

    Authors: Johan Rohdin, Anna Silnova, Mireia Diez, Oldrich Plchot, Pavel Matejka, Lukas Burget

    Abstract: Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work… ▽ More

    Submitted 8 January, 2018; v1 submitted 6 October, 2017; originally announced October 2017.