Search | arXiv e-print repository

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

Authors: Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari

Abstract: We propose a novel training algorithm for a multi-speaker neural text-to-speech (TTS) model based on multi-task adversarial training. A conventional generative adversarial network (GAN)-based training algorithm significantly improves the quality of synthetic speech by reducing the statistical difference between natural and synthetic speech. However, the algorithm does not guarantee the generalizat… ▽ More We propose a novel training algorithm for a multi-speaker neural text-to-speech (TTS) model based on multi-task adversarial training. A conventional generative adversarial network (GAN)-based training algorithm significantly improves the quality of synthetic speech by reducing the statistical difference between natural and synthetic speech. However, the algorithm does not guarantee the generalization performance of the trained TTS model in synthesizing voices of unseen speakers who are not included in the training data. Our algorithm alternatively trains two deep neural networks: multi-task discriminator and multi-speaker neural TTS model (i.e., generator of GANs). The discriminator is trained not only to distinguish between natural and synthetic speech but also to verify the speaker of input speech is existent or non-existent (i.e., newly generated by interpolating seen speakers' embedding vectors). Meanwhile, the generator is trained to minimize the weighted sum of the speech reconstruction loss and adversarial loss for fooling the discriminator, which achieves high-quality multi-speaker TTS even if the target speaker is unseen. Experimental evaluation shows that our algorithm improves the quality of synthetic speech better than a conventional GANSpeech algorithm. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 6 pages, 1 figure, Accepted for APSIPA ASC 2022

arXiv:2206.10256 [pdf, other]

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

Authors: Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari

Abstract: This paper proposes a human-in-the-loop speaker-adaptation method for multi-speaker text-to-speech. With a conventional speaker-adaptation method, a target speaker's embedding vector is extracted from his/her reference speech using a speaker encoder trained on a speaker-discriminative task. However, this method cannot obtain an embedding vector for the target speaker when the reference speech is u… ▽ More This paper proposes a human-in-the-loop speaker-adaptation method for multi-speaker text-to-speech. With a conventional speaker-adaptation method, a target speaker's embedding vector is extracted from his/her reference speech using a speaker encoder trained on a speaker-discriminative task. However, this method cannot obtain an embedding vector for the target speaker when the reference speech is unavailable. Our method is based on a human-in-the-loop optimization framework, which incorporates a user to explore the speaker-embedding space to find the target speaker's embedding. The proposed method uses a sequential line search algorithm that repeatedly asks a user to select a point on a line segment in the embedding space. To efficiently choose the best speech sample from multiple stimuli, we also developed a system in which a user can switch between multiple speakers' voices for each phoneme while looping an utterance. Experimental results indicate that the proposed method can achieve comparable performance to the conventional one in objective and subjective evaluations even if reference speech is not used as the input of a speaker encoder directly. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022

arXiv:2206.07214 [pdf, other]

doi 10.1103/PhysRevResearch.5.043005

Continuous-variable quantum approximate optimization on a programmable photonic quantum processor

Authors: Yutaro Enomoto, Keitaro Anai, Kenta Udagawa, Shuntaro Takeda

Abstract: Variational quantum algorithms (VQAs) provide a promising approach to achieving quantum advantage for practical problems on near-term noisy intermediate-scale quantum (NISQ) devices. Thus far, most studies on VQAs have focused on qubit-based systems, but the power of VQAs can be potentially boosted by exploiting infinite-dimensional continuous-variable (CV) systems. Here, we implement the CV versi… ▽ More Variational quantum algorithms (VQAs) provide a promising approach to achieving quantum advantage for practical problems on near-term noisy intermediate-scale quantum (NISQ) devices. Thus far, most studies on VQAs have focused on qubit-based systems, but the power of VQAs can be potentially boosted by exploiting infinite-dimensional continuous-variable (CV) systems. Here, we implement the CV version of one VQA, a quantum approximate optimization algorithm by developing an automated collaborative computing system between a programmable photonic quantum computer and a classical computer. We experimentally demonstrate that this algorithm solves the minimization problem of simple continuous functions by implementing the quantum version of gradient descent to localize an initially broadly-distributed wavefunction to the minimum. This method allows the execution of a practical CV quantum algorithm on a physical platform. Our work can be extended to the minimization of more general functions, providing an alternative to achieve the quantum advantage in practical problems. △ Less

Submitted 5 October, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Journal ref: Phys. Rev. Research 5, 043005 (2023)

Showing 1–3 of 3 results for author: Udagawa, K