Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Song, Kun; Cong, Jian; Wang, Xinsheng; Zhang, Yongmao; Xie, Lei; Jiang, Ning; Wu, Haiying

Computer Science > Sound

arXiv:2210.17349 (cs)

[Submitted on 31 Oct 2022 (v1), last revised 2 Nov 2022 (this version, v3)]

Title:Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Authors:Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu

View PDF

Abstract:In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by solving the original multi-band MelGAN's metallic sound problem and increasing its generalization ability. Specifically, we introduce a fine-grained network dropout strategy to the generator. With a specifically designed over-smooth handler which separates speech signal intro periodic and aperiodic components, we only perform network dropout to the aperodic components, which alleviates metallic sounding and maintains good speaker similarity. To further improve generalization ability, we introduce several data augmentation methods to augment fake data in the discriminator, including harmonic shift, harmonic noise and phase noise. Experiments show that Robust MelGAN can be used as a universal vocoder, significantly improving sound quality in TTS systems built on various types of data.

Comments:	Accepted by ISCSLP 2022
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2210.17349 [cs.SD]
	(or arXiv:2210.17349v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2210.17349

Submission history

From: Kun Song [view email]
[v1] Mon, 31 Oct 2022 14:24:10 UTC (4,016 KB)
[v2] Tue, 1 Nov 2022 03:30:50 UTC (4,016 KB)
[v3] Wed, 2 Nov 2022 13:05:46 UTC (4,016 KB)

Computer Science > Sound

Title:Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators