Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Cui, Mingyu; Deng, Jiajun; Hu, Shoukang; Xie, Xurong; Wang, Tianzi; Hu, Shujie; Geng, Mengzhe; Xue, Boyang; Liu, Xunying; Meng, Helen

doi:10.21437/Interspeech.2022-696

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2206.11596 (eess)

[Submitted on 23 Jun 2022]

Title:Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Authors:Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

View PDF

Abstract:Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.

Comments:	It' s accepted to ISCA 2022
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2206.11596 [eess.AS]
	(or arXiv:2206.11596v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2206.11596
Related DOI:	https://doi.org/10.21437/Interspeech.2022-696

Submission history

From: Mingyu Cui [view email]
[v1] Thu, 23 Jun 2022 10:17:13 UTC (621 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators