Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Li, Xiang; Song, Changhe; Wei, Xianhao; Wu, Zhiyong; Jia, Jia; Meng, Helen

Computer Science > Sound

arXiv:2208.05359 (cs)

[Submitted on 10 Aug 2022 (v1), last revised 19 Aug 2022 (this version, v2)]

Title:Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Authors:Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

View PDF

Abstract:Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local prosody and global genre, and are rarely accompanied by utterance-level style labels. Thus, properly transferring the reading style across different speakers remains a challenging task. This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches. Moreover, by disentangling speaker timbre and style with the proposed switchable adversarial classifiers, the extracted reading style is made adaptable to the timbre of different speakers. Experiment results confirm that the model manages to transfer a given reading style to new target speakers. With the support of local prosody and global genre type predictor, the potentiality of the proposed method in multi-speaker audiobook generation is further revealed.

Comments:	5 pages, 3 figures, accepted to INTERSPEECH 2022, demo page at this https URL
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2208.05359 [cs.SD]
	(or arXiv:2208.05359v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2208.05359

Submission history

From: Xiang Li [view email]
[v1] Wed, 10 Aug 2022 14:08:35 UTC (1,500 KB)
[v2] Fri, 19 Aug 2022 10:31:00 UTC (1,497 KB)

Computer Science > Sound

Title:Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators