FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Leng, Yichong; Tan, Xu; Wang, Rui; Zhu, Linchen; Xu, Jin; Liu, Wenjie; Liu, Linquan; Qin, Tao; Li, Xiang-Yang; Lin, Edward; Liu, Tie-Yan

Computer Science > Computation and Language

arXiv:2109.14420 (cs)

[Submitted on 29 Sep 2021 (v1), last revised 29 Nov 2022 (this version, v4)]

Title:FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Authors:Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

View PDF

Abstract:Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.

Comments:	Findings of EMNLP 2021
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2109.14420 [cs.CL]
	(or arXiv:2109.14420v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.14420

Submission history

From: Yichong Leng [view email]
[v1] Wed, 29 Sep 2021 13:48:03 UTC (317 KB)
[v2] Fri, 1 Oct 2021 06:57:39 UTC (318 KB)
[v3] Mon, 18 Oct 2021 05:45:26 UTC (317 KB)
[v4] Tue, 29 Nov 2022 09:27:08 UTC (317 KB)

Computer Science > Computation and Language

Title:FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators