Guided-TTS:Text-to-Speech with Untranscribed Speech

Kim, Heeseung; Kim, Sungwon; Yoon, Sungroh

Computer Science > Sound

arXiv:2111.11755v1 (cs)

[Submitted on 23 Nov 2021 (this version), latest version 10 Jun 2022 (v4)]

Title:Guided-TTS:Text-to-Speech with Untranscribed Speech

Authors:Heeseung Kim, Sungwon Kim, Sungroh Yoon

View PDF

Abstract:Most neural text-to-speech (TTS) models require <speech, transcript> paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2111.11755 [cs.SD]
	(or arXiv:2111.11755v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2111.11755

Submission history

From: Sungwon Kim [view email]
[v1] Tue, 23 Nov 2021 10:05:05 UTC (677 KB)
[v2] Tue, 7 Dec 2021 09:13:13 UTC (678 KB)
[v3] Sat, 29 Jan 2022 01:34:24 UTC (590 KB)
[v4] Fri, 10 Jun 2022 15:01:34 UTC (593 KB)

Computer Science > Sound

Title:Guided-TTS:Text-to-Speech with Untranscribed Speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Guided-TTS:Text-to-Speech with Untranscribed Speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators