Universal Neural Machine Translation for Extremely Low Resource Languages

Gu, Jiatao; Hassan, Hany; Devlin, Jacob; Li, Victor O. K.

Computer Science > Computation and Language

arXiv:1802.05368 (cs)

[Submitted on 15 Feb 2018 (v1), last revised 17 Apr 2018 (this version, v2)]

Title:Universal Neural Machine Translation for Extremely Low Resource Languages

Authors:Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O.K. Li

View PDF

Abstract:In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multilingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting.

Comments:	NAACL-HLT 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1802.05368 [cs.CL]
	(or arXiv:1802.05368v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1802.05368

Submission history

From: Hany Hassan Awadalla [view email]
[v1] Thu, 15 Feb 2018 00:35:08 UTC (1,258 KB)
[v2] Tue, 17 Apr 2018 02:15:20 UTC (1,285 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2018-02

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jiatao Gu
Hany Hassan
Jacob Devlin
Victor O. K. Li

export BibTeX citation

Computer Science > Computation and Language

Title:Universal Neural Machine Translation for Extremely Low Resource Languages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Universal Neural Machine Translation for Extremely Low Resource Languages

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators