AdaptVC: High Quality Voice Conversion with Adaptive Learning

Kim, Jaehun; Kim, Ji-Hoon; Choi, Yeunju; Nguyen, Tan Dat; Mun, Seongkyu; Chung, Joon Son

Computer Science > Sound

arXiv:2501.01347 (cs)

[Submitted on 2 Jan 2025 (v1), last revised 14 Jan 2025 (this version, v4)]

Title:AdaptVC: High Quality Voice Conversion with Adaptive Learning

Authors:Jaehun Kim, Ji-Hoon Kim, Yeunju Choi, Tan Dat Nguyen, Seongkyu Mun, Joon Son Chung

View PDF HTML (experimental)

Abstract:The goal of voice conversion is to transform the speech of a source speaker to sound like that of a reference speaker while preserving the original content. A key challenge is to extract disentangled linguistic content from the source and voice style from the reference. While existing approaches leverage various methods to isolate the two, a generalization still requires further attention, especially for robustness in zero-shot scenarios. In this paper, we achieve successful disentanglement of content and speaker features by tuning self-supervised speech features with adapters. The adapters are trained to dynamically encode nuanced features from rich self-supervised features, and the decoder fuses them to produce speech that accurately resembles the reference with minimal loss of content. Moreover, we leverage a conditional flow matching decoder with cross-attention speaker conditioning to further boost the synthesis quality and efficiency. Subjective and objective evaluations in a zero-shot scenario demonstrate that the proposed method outperforms existing models in speech quality and similarity to the reference speech.

Comments:	ICASSP 2025; demo available this https URL
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2501.01347 [cs.SD]
	(or arXiv:2501.01347v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2501.01347

Submission history

From: Jaehun Kim [view email]
[v1] Thu, 2 Jan 2025 16:54:08 UTC (6,055 KB)
[v2] Fri, 3 Jan 2025 04:37:03 UTC (1 KB) (withdrawn)
[v3] Tue, 7 Jan 2025 05:03:55 UTC (1 KB) (withdrawn)
[v4] Tue, 14 Jan 2025 11:36:42 UTC (6,055 KB)

Computer Science > Sound

Title:AdaptVC: High Quality Voice Conversion with Adaptive Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:AdaptVC: High Quality Voice Conversion with Adaptive Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators