Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Yamada, Ikuya; Asai, Akari; Sakuma, Jin; Shindo, Hiroyuki; Takeda, Hideaki; Takefuji, Yoshiyasu; Matsumoto, Yuji

Computer Science > Computation and Language

arXiv:1812.06280 (cs)

[Submitted on 15 Dec 2018 (v1), last revised 26 Sep 2020 (this version, v4)]

Title:Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Authors:Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto

View PDF

Abstract:The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at this https URL.

Comments:	EMNLP 2020 (system demonstration)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1812.06280 [cs.CL]
	(or arXiv:1812.06280v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1812.06280

Submission history

From: Ikuya Yamada [view email]
[v1] Sat, 15 Dec 2018 12:51:39 UTC (383 KB)
[v2] Wed, 26 Dec 2018 14:25:27 UTC (383 KB)
[v3] Thu, 30 Jan 2020 10:58:05 UTC (678 KB)
[v4] Sat, 26 Sep 2020 14:28:42 UTC (7,752 KB)

Computer Science > Computation and Language

Title:Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators