Skip to main content

Showing 1–5 of 5 results for author: Zainkó, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.07122  [pdf

    cs.SD eess.AS

    Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: Neural network-based Text-to-Speech has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron2, FastSpeech, FastPitch) usually generate Mel-spectrogram from text and then synthesize speech using vocoder (e.g., WaveNet, WaveGlow, HiFiGAN). Compared with traditional parametric approaches (e.g., STRAIGHT and WORLD), neural vocoder based end-to-end models suffer f… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: accepted at EUSIPCO2022

  2. arXiv:2204.11030  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Improving Self-Supervised Learning-based MOS Prediction Networks

    Authors: Bálint Gyires-Tóth, Csaba Zainkó

    Abstract: MOS (Mean Opinion Score) is a subjective method used for the evaluation of a system's quality. Telecommunications (for voice and video), and speech synthesis systems (for generated speech) are a few of the many applications of the method. While MOS tests are widely accepted, they are time-consuming and costly since human input is required. In addition, since the systems and subjects of the tests d… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  3. arXiv:2107.12051  [pdf, other

    eess.AS cs.AI cs.SD

    Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

    Authors: Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó

    Abstract: For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a limited database. We use… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.03152

  4. arXiv:2106.06863  [pdf

    cs.SD eess.AS

    Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: To date, various speech technology systems have adopted the vocoder approach, a method for synthesizing speech waveform that shows a major role in the performance of statistical parametric speech synthesis. WaveNet one of the best models that nearly resembles the human voice, has to generate a waveform in a time consuming sequential manner with an extremely complex structure of its neural networks… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures, accepted to the conference of Interspeech 2021

  5. arXiv:2008.03152  [pdf, other

    eess.AS cs.SD

    Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

    Authors: Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, Alexandra Markó

    Abstract: For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of En… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 5 pages, accepted for publication at Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:1906.09885