Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Lorincz, Beata; Stan, Adriana; Giurgiu, Mircea

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2106.01789 (eess)

[Submitted on 3 Jun 2021]

Title:Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Authors:Beata Lorincz, Adriana Stan, Mircea Giurgiu

View PDF

Abstract:Building multispeaker neural network-based text-to-speech synthesis systems commonly relies on the availability of large amounts of high quality recordings from each speaker and conditioning the training process on the speaker's identity or on a learned representation of it. However, when little data is available from each speaker, or the number of speakers is limited, the multispeaker TTS can be hard to train and will result in poor speaker similarity and naturalness.
In order to address this issue, we explore two directions: forcing the network to learn a better speaker identity representation by appending an additional loss term; and augmenting the input data pertaining to each speaker using waveform manipulation methods. We show that both methods are efficient when evaluated with both objective and subjective measures. The additional loss term aids the speaker similarity, while the data augmentation improves the intelligibility of the multispeaker TTS system.

Comments:	Accepted at EUSIPCO 2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2106.01789 [eess.AS]
	(or arXiv:2106.01789v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2106.01789

Submission history

From: Adriana Stan PhD [view email]
[v1] Thu, 3 Jun 2021 12:22:18 UTC (449 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators