Skip to main content

Showing 1–6 of 6 results for author: Sokolova, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.08093  [pdf, other

    cs.LG cs.CL eess.AS

    BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

    Authors: Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

    Abstract: We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: v1.1 (fixed typos)

  2. arXiv:2307.07062  [pdf, other

    eess.AS cs.LG cs.SD

    Controllable Emphasis with zero data for text-to-speech

    Authors: Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova

    Abstract: We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques im… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: In proceeding of 12th Speech Synthesis Workshop (SSW) 2023

  3. arXiv:2303.16085  [pdf, other

    eess.IV cs.CV

    Whole-body PET image denoising for reduced acquisition time

    Authors: Ivan Kruzhilov, Stepan Kudin, Luka Vetoshkin, Elena Sokolova, Vladimir Kokh

    Abstract: This paper evaluates the performance of supervised and unsupervised deep learning models for denoising positron emission tomography (PET) images in the presence of reduced acquisition times. Our experiments consider 212 studies (56908 images), and evaluate the models using 2D (RMSE, SSIM) and 3D (SUVpeak and SUVmax error for the regions of interest) metrics. It was shown that, in contrast to previ… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  4. arXiv:2202.06409  [pdf, other

    eess.AS cs.CL cs.LG

    Distribution augmentation for low-resource expressive text-to-speech

    Authors: Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova

    Abstract: This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available during training. This helps to reduce overfitting, especially in low-resource settings. Our method relies on substituting text and audio fragments in a w… ▽ More

    Submitted 19 February, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: ICASSP 2022: camera-ready

  5. arXiv:2105.11863  [pdf, other

    eess.IV cs.CV cs.LG

    CoRSAI: A System for Robust Interpretation of CT Scans of COVID-19 Patients Using Deep Learning

    Authors: Manvel Avetisian, Ilya Burenko, Konstantin Egorov, Vladimir Kokh, Aleksandr Nesterov, Aleksandr Nikolaev, Alexander Ponomarchuk, Elena Sokolova, Alex Tuzhilin, Dmitry Umerenkov

    Abstract: Analysis of chest CT scans can be used in detecting parts of lungs that are affected by infectious diseases such as COVID-19.Determining the volume of lungs affected by lesions is essential for formulating treatment recommendations and prioritizingpatients by severity of the disease. In this paper we adopted an approach based on using an ensemble of deep convolutionalneural networks for segmentati… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  6. arXiv:2011.09303  [pdf, other

    eess.SP cs.CV cs.LG

    Noise-Resilient Automatic Interpretation of Holter ECG Recordings

    Authors: Konstantin Egorov, Elena Sokolova, Manvel Avetisian, Alexander Tuzhilin

    Abstract: Holter monitoring, a long-term ECG recording (24-hours and more), contains a large amount of valuable diagnostic information about the patient. Its interpretation becomes a difficult and time-consuming task for the doctor who analyzes them because every heartbeat needs to be classified, thus requiring highly accurate methods for automatic interpretation. In this paper, we present a three-stage pro… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted for publication on BIOSIGNALS 2021