Skip to main content

Showing 1–7 of 7 results for author: Al-Radhi, M S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.07122  [pdf

    cs.SD eess.AS

    Towards Parametric Speech Synthesis Using Gaussian-Markov Model of Spectral Envelope and Wavelet-Based Decomposition of F0

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: Neural network-based Text-to-Speech has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron2, FastSpeech, FastPitch) usually generate Mel-spectrogram from text and then synthesize speech using vocoder (e.g., WaveNet, WaveGlow, HiFiGAN). Compared with traditional parametric approaches (e.g., STRAIGHT and WORLD), neural vocoder based end-to-end models suffer f… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: accepted at EUSIPCO2022

  2. arXiv:2108.01154  [pdf, other

    cs.SD eess.AS

    Speaker Adaptation with Continuous Vocoder-based DNN-TTS

    Authors: Ali Raheem Mandeel, Mohammed Salah Al-Radhi, Tamás Gábor Csapó

    Abstract: Traditional vocoder-based statistical parametric speech synthesis can be advantageous in applications that require low computational complexity. Recent neural vocoders, which can produce high naturalness, still cannot fulfill the requirement of being real-time during synthesis. In this paper, we experiment with our earlier continuous vocoder, in which the excitation is modeled with two one-dimensi… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 10 pages, 3 figures, 23RD INTERNATIONAL CONFERENCE ON SPEECH AND COMPUTER SPECOM 2021

  3. arXiv:2106.10481  [pdf

    cs.SD cs.AI eess.AS

    Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh

    Abstract: Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high computational complexity and irregular structures are still considered challenging concerns, which yield a variety of voice quality degradation. Therefore, thi… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: 6 pages, 3 figures, International Conference on Artificial Intelligence and Speech Technology (AIST2020)

  4. arXiv:2106.06863  [pdf

    cs.SD eess.AS

    Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Csaba Zainkó, Géza Németh

    Abstract: To date, various speech technology systems have adopted the vocoder approach, a method for synthesizing speech waveform that shows a major role in the performance of statistical parametric speech synthesis. WaveNet one of the best models that nearly resembles the human voice, has to generate a waveform in a time consuming sequential manner with an extremely complex structure of its neural networks… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures, accepted to the conference of Interspeech 2021

  5. arXiv:2101.10278  [pdf

    cs.SD eess.AS

    High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

    Authors: Mohammed Salah Al-Radhi

    Abstract: This Ph.D. thesis focuses on developing a system for high-quality speech synthesis and voice conversion. Vocoder-based speech analysis, manipulation, and synthesis plays a crucial role in various kinds of statistical parametric speech research. Although there are vocoding methods which yield close to natural synthesized speech, they are typically computationally expensive, and are thus not suitabl… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

    Comments: Ph.D. Dissertation https://repozitorium.omikk.bme.hu/handle/10890/13411

  6. arXiv:1906.09885  [pdf, other

    cs.SD eess.AS eess.IV

    Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

    Authors: Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó

    Abstract: Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even whe… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 5 pages, 3 figures, accepted for publication at Interspeech 2019

  7. arXiv:1904.06075  [pdf

    cs.SD eess.AS

    RNN-based speech synthesis using a continuous sinusoidal model

    Authors: Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh

    Abstract: Recently in statistical parametric speech synthesis, we proposed a continuous sinusoidal model (CSM) using continuous F0 (contF0) in combination with Maximum Voiced Frequency (MVF), which was successfully giving state-of-the-art vocoders performance (e.g. similar to STRAIGHT) in synthesized speech. In this paper, we address the use of sequence-to-sequence modeling with recurrent neural networks (R… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

    Comments: 8 pages, 4 figures, Accepted to IJCNN 2019