Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Tjandra, Andros; Sakti, Sakriani; Nakamura, Satoshi

Computer Science > Computation and Language

arXiv:1705.08091v1 (cs)

[Submitted on 23 May 2017 (this version), latest version 3 Nov 2017 (v2)]

Title:Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Authors:Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

View PDF

Abstract:Recently, sequence-to-sequence model by using encoder-decoder neural network has gained popularity for automatic speech recognition (ASR). The architecture commonly uses an attentional mechanism which allows the model to learn alignments between source speech sequence and target text sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in speech recognition task. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results demonstrate that encoder-decoder based ASR with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture.

Comments:	12 pages, 2 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1705.08091 [cs.CL]
	(or arXiv:1705.08091v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1705.08091

Submission history

From: Andros Tjandra [view email]
[v1] Tue, 23 May 2017 06:32:36 UTC (483 KB)
[v2] Fri, 3 Nov 2017 15:34:00 UTC (1,041 KB)

Computer Science > Computation and Language

Title:Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Local Monotonic Attention Mechanism for End-to-End Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators