Skip to main content

Showing 1–7 of 7 results for author: Lorenzo-Trueba, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:1911.12760  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

    Authors: Vatsal Aggarwal, Marius Cotescu, Nishant Prateek, Jaime Lorenzo-Trueba, Roberto Barra-Chicote

    Abstract: We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second. Specifically, we enhance the disentanglement capabilities of a state-of-the-art sequence-to-sequence based system with a Variational AutoEncoder (VAE) and a Householder Flow. The proposed system provides a 22% KL-divergence reduction while jointly improving percept… ▽ More

    Submitted 17 February, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted to ICASSP 2020

  2. arXiv:1807.11470  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

    Authors: Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi

    Abstract: Generating versatile and appropriate synthetic speech requires control over the output expression separate from the spoken text. Important non-textual speech variation is seldom annotated, in which case output control must be learned in an unsupervised fashion. In this paper, we perform an in-depth study of methods for unsupervised learning of control in statistical speech synthesis. For example,… ▽ More

    Submitted 9 September, 2018; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: 17 pages, 4 figures

    MSC Class: 62F99 ACM Class: I.2.7; G.3

  3. arXiv:1804.08438  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

    Authors: Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhenhua Ling

    Abstract: Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expens… ▽ More

    Submitted 4 September, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

    Comments: Correction (bug fix) of a published ODYSSEY 2018 publication with the same title and author list; more details in footnote in page 1

  4. arXiv:1804.04262  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

    Authors: Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhenhua Ling

    Abstract: We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e. transform the vocal identity) of a source speaker to a target speaker while maintaining linguistic inform… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Accepted for Speaker Odyssey 2018

  5. arXiv:1804.02549  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

    Authors: Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

    Abstract: Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches. In this paper, we build a framework in which we can fairly compare new vocoding and acoustic modeling techniques with conventional approaches… ▽ More

    Submitted 7 April, 2018; originally announced April 2018.

    Comments: To appear in ICASSP 2018

  6. arXiv:1804.00425  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    High-quality nonparallel voice conversion based on cycle-consistent adversarial network

    Authors: Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba

    Abstract: Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data. In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training. A CycleGAN is a generative adversarial network (GAN) originally developed f… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: accepted at ICASSP 2018

  7. arXiv:1803.00860  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

    Authors: Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

    Abstract: Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include di… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

    Comments: conference manuscript submitted to Speaker Odyssey 2018