Skip to main content

Showing 1–5 of 5 results for author: Makarov, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.02908  [pdf, ps, other

    eess.AS cs.LG

    Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

    Authors: Bunlong Lay, Rostilav Makarov, Timo Gerkmann

    Abstract: Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing streaming data in real-time. In this work, we adapt a sliding window diffusion framework to the speech enhancement task. Our approach progressively corrupts speech si… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figures, Accepted to Interspeech 2025

  2. arXiv:2411.07754  [pdf, other

    eess.AS

    Study on Inter and Intra Speaker Variability in Speaker Recognition

    Authors: Anton Okhotnikov, Nikita Torgashov, Ivan Yakovlev, Pavel Malov, Rostislav Makarov

    Abstract: Optimization of a trade-off between the number of speakers and their temporal variability (or session diversity) is crucial for the development of a speaker recognition system together with making the data collection process feasible from a time perspective. In this article, we provide the analysis of dependency between inter and intra speaker variability in training data for the modern neural net… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

  3. Reshape Dimensions Network for Speaker Recognition

    Authors: Ivan Yakovlev, Rostislav Makarov, Andrei Balykin, Pavel Malov, Anton Okhotnikov, Nikita Torgashov

    Abstract: In this paper, we present Reshape Dimensions Network (ReDimNet), a novel neural network architecture for extracting utterance-level speaker representations. Our approach leverages dimensionality reshaping of 2D feature maps to 1D signal representation and vice versa, enabling the joint usage of 1D and 2D blocks. We propose an original network topology that preserves the volume of channel-timestep-… ▽ More

    Submitted 25 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: Proceedings of Interspeech

  4. LRPD: Large Replay Parallel Dataset

    Authors: Ivan Yakovlev, Mikhail Melnikov, Nikita Bukhal, Rostislav Makarov, Alexander Alenin, Nikita Torgashov, Anton Okhotnikov

    Abstract: The latest research in the field of voice anti-spoofing (VAS) shows that deep neural networks (DNN) outperform classic approaches like GMM in the task of presentation attack detection. However, DNNs require a lot of data to converge, and still lack generalization ability. In order to foster the progress of neural network systems, we introduce a Large Replay Parallel Dataset (LRPD) aimed for a dete… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6612-6616

  5. arXiv:2308.08294  [pdf, ps, other

    eess.AS

    The ID R&D VoxCeleb Speaker Recognition Challenge 2023 System Description

    Authors: Nikita Torgashov, Rostislav Makarov, Ivan Yakovlev, Pavel Malov, Andrei Balykin, Anton Okhotnikov

    Abstract: This report describes ID R&D team submissions for Track 2 (open) to the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). Our solution is based on the fusion of deep ResNets and self-supervised learning (SSL) based models trained on a mixture of a VoxCeleb2 dataset and a large version of a VoxTube dataset. The final submission to the Track 2 achieved the first place on the VoxSRC-23 public… ▽ More

    Submitted 20 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.