Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Wu, Yi-Chiao; Tobing, Patrick Lumban; Kobayashi, Kazuhiro; Hayashi, Tomoki; Toda, Tomoki

doi:10.1109/ACCESS.2020.2984007

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2003.11750 (eess)

[Submitted on 26 Mar 2020 (v1), last revised 7 Apr 2020 (this version, v2)]

Title:Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Authors:Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda

View PDF

Abstract:In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.

Comments:	13 pages, 13 figures, 1 table, accepted to publish in IEEE Access
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2003.11750 [eess.AS]
	(or arXiv:2003.11750v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2003.11750
Related DOI:	https://doi.org/10.1109/ACCESS.2020.2984007

Submission history

From: Yi-Chiao Wu [view email]
[v1] Thu, 26 Mar 2020 05:37:09 UTC (1,513 KB)
[v2] Tue, 7 Apr 2020 00:27:13 UTC (1,513 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators