Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Logacheva, Varvara; Teslenko, Denis; Shelmanov, Artem; Remus, Steffen; Ustalov, Dmitry; Kutuzov, Andrey; Artemova, Ekaterina; Biemann, Chris; Ponzetto, Simone Paolo; Panchenko, Alexander

Computer Science > Computation and Language

arXiv:2003.06651 (cs)

[Submitted on 14 Mar 2020]

Title:Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Authors:Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

View PDF

Abstract:Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

Comments:	10 pages, 5 figures, 4 tables, accepted at LREC 2020
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2003.06651 [cs.CL]
	(or arXiv:2003.06651v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2003.06651

Submission history

From: Dmitry Ustalov [view email]
[v1] Sat, 14 Mar 2020 14:50:04 UTC (223 KB)

Computer Science > Computation and Language

Title:Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators