Non-Autoregressive Transformer Automatic Speech Recognition

Chen, Nanxin; Watanabe, Shinji; Villalba, Jesús; Dehak, Najim

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1911.04908v1 (eess)

[Submitted on 10 Nov 2019 (this version), latest version 6 Apr 2020 (v2)]

Title:Non-Autoregressive Transformer Automatic Speech Recognition

Authors:Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak

View PDF

Abstract:Recently very deep transformers start showing outperformed performance to traditional bi-directional long short-term memory networks by a large margin. However, to put it into production usage, inference computation cost and latency are still serious concerns in real scenarios. In this paper, we study a novel non-autoregressive transformers structure for speech recognition, which is originally introduced in machine translation. During training input tokens fed to the decoder are randomly replaced by a special mask token. The network is required to predict those mask tokens by taking both context and input speech into consideration. During inference, we start from all mask tokens and the network gradually predicts all tokens based on partial results. We show this framework can support different decoding strategies, including traditional left-to-right. A new decoding strategy is proposed as an example, which starts from the easiest predictions to difficult ones. Some preliminary results on Aishell and CSJ benchmarks show the possibility to train such a non-autoregressive network for ASR. Especially in Aishell, the proposed method outperformed Kaldi nnet3 and chain model setup and is quite closed to the performance of the start-of-the-art end-to-end model.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Cite as:	arXiv:1911.04908 [eess.AS]
	(or arXiv:1911.04908v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1911.04908

Submission history

From: Nanxin Chen [view email]
[v1] Sun, 10 Nov 2019 06:05:14 UTC (208 KB)
[v2] Mon, 6 Apr 2020 14:45:19 UTC (368 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Non-Autoregressive Transformer Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Non-Autoregressive Transformer Automatic Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators