Bilingual Lexicon Induction through Unsupervised Machine Translation

Artetxe, Mikel; Labaka, Gorka; Agirre, Eneko

doi:10.18653/v1/P19-1494

Computer Science > Computation and Language

arXiv:1907.10761 (cs)

[Submitted on 24 Jul 2019]

Title:Bilingual Lexicon Induction through Unsupervised Machine Translation

Authors:Mikel Artetxe, Gorka Labaka, Eneko Agirre

View PDF

Abstract:A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.

Comments:	ACL 2019
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1907.10761 [cs.CL]
	(or arXiv:1907.10761v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1907.10761
Related DOI:	https://doi.org/10.18653/v1/P19-1494

Submission history

From: Mikel Artetxe [view email]
[v1] Wed, 24 Jul 2019 22:30:04 UTC (29 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2019-07

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Mikel Artetxe
Gorka Labaka
Eneko Agirre

export BibTeX citation

Computer Science > Computation and Language

Title:Bilingual Lexicon Induction through Unsupervised Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Bilingual Lexicon Induction through Unsupervised Machine Translation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators