Multi-Channel Transformer Transducer for Speech Recognition

Chang, Feng-Ju; Radfar, Martin; Mouchtaris, Athanasios; Omologo, Maurizio

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2108.12953 (eess)

[Submitted on 30 Aug 2021]

Title:Multi-Channel Transformer Transducer for Speech Recognition

Authors:Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

View PDF

Abstract:Multi-channel inputs offer several advantages over single-channel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems. In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition. In a far-field in-house dataset, our MCTT outperforms stagewise multi-channel models with transformer-transducer up to 6.01% relative WER improvement (WERR). In addition, MCTT outperforms the multi-channel transformer up to 11.62% WERR, and is 15.8 times faster in terms of inference speed. We further show that we can improve the computational cost of MCTT by constraining the future and previous context in attention computations.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2108.12953 [eess.AS]
	(or arXiv:2108.12953v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2108.12953
Journal reference:	Published in INTERSPEECH 2021

Submission history

From: Feng-Ju Chang [view email]
[v1] Mon, 30 Aug 2021 01:50:51 UTC (182 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multi-Channel Transformer Transducer for Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Multi-Channel Transformer Transducer for Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators