Skip to main content

Showing 1–5 of 5 results for author: Rohnke, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2212.03398   

    eess.AS cs.CL cs.SD

    Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

    Authors: Daxin Tan, Nikos Kargas, David McHardy, Constantinos Papayiannis, Antonio Bonafonte, Marek Strelec, Jonas Rohnke, Agis Oikonomou Filandras, Trevor Wood

    Abstract: Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: This version has been removed by arXiv administrators because the submitter did not have the right to assign a license at the time of submission

  2. arXiv:2110.12539  [pdf, other

    cs.SD cs.LG eess.AS

    Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

    Authors: Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz Ɓajszczak, Trevor Wood

    Abstract: We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while keeping signi… ▽ More

    Submitted 14 September, 2023; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: 5 pages, 5 figures, accepted at IberSPEECH 2022

  3. arXiv:2012.09703  [pdf, other

    eess.AS cs.SD

    Parallel WaveNet conditioned on VAE latent vectors

    Authors: Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote

    Abstract: Recently the state-of-the-art text-to-speech synthesis systems have shifted to a two-model approach: a sequence-to-sequence model to predict a representation of speech (typically mel-spectrograms), followed by a 'neural vocoder' model which produces the time-domain speech waveform from this intermediate speech representation. This approach is capable of synthesizing speech that is confusable with… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  4. Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

    Authors: Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba

    Abstract: Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particula… ▽ More

    Submitted 18 November, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Journal ref: INTERSPEECH 2020: 4407-4411

  5. arXiv:1907.02479  [pdf, other

    eess.AS cs.CL

    Fine-grained robust prosody transfer for single-speaker neural text-to-speech

    Authors: Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman

    Abstract: We present a neural text-to-speech system for fine-grained prosody transfer from one speaker to another. Conventional approaches for end-to-end prosody transfer typically use either fixed-dimensional or variable-length prosody embedding via a secondary attention to encode the reference signal. However, when trained on a single-speaker dataset, the conventional prosody transfer systems are not robu… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: 5 pages, 7 figures, Accepted for Interspeech 2019