Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
Authors:
Jan Silovsky,
Liuhui Deng,
Arturo Argueta,
Tresi Arvizo,
Roger Hsiao,
Sasha Kuznietsov,
Yiu-Chang Lin,
Xiaoqiang Xiao,
Yuanyuan Zhang
Abstract:
Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems.
Cross-lingual knowledge tran…
▽ More
Voice technology has become ubiquitous recently. However, the accuracy, and hence experience, in different languages varies significantly, which makes the technology not equally inclusive. The availability of data for different languages is one of the key factors affecting accuracy, especially in training of all-neural end-to-end automatic speech recognition systems.
Cross-lingual knowledge transfer and iterative pseudo-labeling are two techniques that have been shown to be successful for improving the accuracy of ASR systems, in particular for low-resource languages, like Ukrainian.
Our goal is to train an all-neural Transducer-based ASR system to replace a DNN-HMM hybrid system with no manually annotated training data. We show that the Transducer system trained using transcripts produced by the hybrid system achieves 18% reduction in terms of word error rate. However, using a combination of cross-lingual knowledge transfer from related languages and iterative pseudo-labeling, we are able to achieve 35% reduction of the error rate.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Authors:
Thien Nguyen,
Nathalie Tran,
Liuhui Deng,
Thiago Fraga da Silva,
Matthew Radzihovsky,
Roger Hsiao,
Henry Mason,
Stefan Braun,
Erik McDermott,
Dogan Can,
Pawel Swietojanski,
Lyan Verwimp,
Sibel Oyman,
Tresi Arvizo,
Honza Silovsky,
Arnab Ghoshal,
Mathieu Martel,
Bharat Ram Ambati,
Mohamed Ali
Abstract:
Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-swit…
▽ More
Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech. We analyze how each of the neural transducer's encoders contributes towards code-switching performance by measuring encoder-specific recall values, and evaluate our English/Mandarin system on the ASCEND data set. Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute compared to the previous literature -- while maintaining good accuracy on the monolingual test sets.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.