Guided Alignment Training for Topic-Aware Neural Machine Translation

Chen, Wenhu; Matusov, Evgeny; Khadivi, Shahram; Peter, Jan-Thorsten

Computer Science > Computation and Language

arXiv:1607.01628 (cs)

[Submitted on 6 Jul 2016]

Title:Guided Alignment Training for Topic-Aware Neural Machine Translation

Authors:Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

View PDF

Abstract:In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.

Comments:	11 pages
Subjects:	Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1607.01628 [cs.CL]
	(or arXiv:1607.01628v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1607.01628

Submission history

From: Evgeny Matusov [view email]
[v1] Wed, 6 Jul 2016 14:13:12 UTC (328 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2016-07

Change to browse by:

cs
cs.NE

References & Citations

DBLP - CS Bibliography

listing | bibtex

Wenhu Chen
Evgeny Matusov
Shahram Khadivi
Jan-Thorsten Peter

export BibTeX citation

Computer Science > Computation and Language

Title:Guided Alignment Training for Topic-Aware Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Guided Alignment Training for Topic-Aware Neural Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators