End-to-End Multi-Channel Transformer for Speech Recognition

Chang, Feng-Ju; Radfar, Martin; Mouchtaris, Athanasios; King, Brian; Kunzmann, Siegfried

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2102.03951 (eess)

[Submitted on 8 Feb 2021]

Title:End-to-End Multi-Channel Transformer for Speech Recognition

Authors:Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

View PDF

Abstract:Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA). The CSA and CCA layers encode the contextual relationship within and between channels and across time, respectively. The channel-attended outputs from CSA and CCA are then fed into the EDA layers to help decode the next token given the preceding ones. The experiments show that in a far-field in-house dataset, our method outperforms the baseline single-channel transformer, as well as the super-directive and neural beamformers cascaded with the transformers.

Comments:	Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2102.03951 [eess.AS]
	(or arXiv:2102.03951v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2102.03951

Submission history

From: Feng-Ju Chang [view email]
[v1] Mon, 8 Feb 2021 00:12:44 UTC (464 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:End-to-End Multi-Channel Transformer for Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:End-to-End Multi-Channel Transformer for Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators