Skip to main content

Showing 1–6 of 6 results for author: Hodari, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2304.00714  [pdf, other

    eess.AS

    Ensemble prosody prediction for expressive speech synthesis

    Authors: Tian Huey Teh, Vivian Hu, Devang S Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gomez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King

    Abstract: Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ens… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023

  2. arXiv:2303.09446  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Controllable Prosody Generation With Partial Inputs

    Authors: Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, Zack Hodari

    Abstract: We address the problem of human-in-the-loop control for generating prosody in the context of text-to-speech synthesis. Controlling prosody is challenging because existing generative models lack an efficient interface through which users can modify the output quickly and precisely. To solve this, we introduce a novel framework whereby the user provides partial inputs and the generative model genera… ▽ More

    Submitted 15 April, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 5 pages

  3. arXiv:2011.02252  [pdf, other

    eess.AS cs.CL cs.SD

    Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

    Authors: Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman

    Abstract: In this paper, we introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis with contextually appropriate prosody. In Stage I, we learn a prosodic distribution at the sentence level from mel-spectrograms available during training. In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information ava… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: 5 pages and 3 figures

  4. arXiv:2011.01175  [pdf, other

    eess.AS

    CAMP: a Two-Stage Approach to Modelling Prosody in Context

    Authors: Zack Hodari, Alexis Moinet, Sri Karlapati, Jaime Lorenzo-Trueba, Thomas Merritt, Arnaud Joly, Ammar Abbas, Penny Karanasou, Thomas Drugman

    Abstract: Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic signal (e.g. segmental information and background noise); (2) determining appropriate prosody without sufficient context is an ill-posed problem. In th… ▽ More

    Submitted 12 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 5 pages. Published in the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

  5. arXiv:2003.06686  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

    Authors: Zack Hodari, Catherine Lai, Simon King

    Abstract: In English, prosody adds a broad range of information to segment sequences, from information structure (e.g. contrast) to stylistic variation (e.g. expression of emotion). However, when learning to control prosody in text-to-speech voices, it is not clear what exactly the control is modifying. Existing research on discrete representation learning for prosody has demonstrated high naturalness, but… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: Published to the 10th ISCA International Conference on Speech Prosody (SP2020)

  6. arXiv:1906.04233  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Using generative modelling to produce varied intonation for speech synthesis

    Authors: Zack Hodari, Oliver Watts, Simon King

    Abstract: Unlike human speakers, typical text-to-speech (TTS) systems are unable to produce multiple distinct renditions of a given sentence. This has previously been addressed by adding explicit external control. In contrast, generative models are able to capture a distribution over multiple renditions and thus produce varied renditions using sampling. Typical neural TTS models learn the average of the dat… ▽ More

    Submitted 12 September, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Accepted for the 10th ISCA Speech Synthesis Workshop (SSW10)