Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Conia, Simone; Li, Min; Lee, Daniel; Minhas, Umar Farooq; Ilyas, Ihab; Li, Yunyao

Computer Science > Artificial Intelligence

arXiv:2311.15781 (cs)

[Submitted on 27 Nov 2023]

Title:Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Authors:Simone Conia, Min Li, Daniel Lee, Umar Farooq Minhas, Ihab Ilyas, Yunyao Li

View PDF

Abstract:Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.

Comments:	Camera ready for EMNLP 2023
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2311.15781 [cs.AI]
	(or arXiv:2311.15781v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2311.15781

Submission history

From: Simone Conia [view email]
[v1] Mon, 27 Nov 2023 12:54:47 UTC (4,398 KB)

Computer Science > Artificial Intelligence

Title:Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators