Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Chen, Mingjie; Hain, Thomas

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2008.06892 (eess)

[Submitted on 16 Aug 2020]

Title:Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Authors:Mingjie Chen, Thomas Hain

View PDF

Abstract:Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the top position for naturalness (Mean Opinion Score 4.06), top position for intelligibility (Character Error Rate 0.15), and third position for the quality of the representation (ABX test score 12.5). These and further analysis in this paper illustrates that quality of the converted speech and the acoustic units representation can be well balanced.

Comments:	To be presented in Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2008.06892 [eess.AS]
	(or arXiv:2008.06892v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2008.06892

Submission history

From: Mingjie Chen [view email]
[v1] Sun, 16 Aug 2020 12:16:29 UTC (258 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators