XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Van Nguyen, Kiet; Do, Phong Nguyen-Thuan; Nguyen, Nhat Duy; Van Huynh, Tin; Nguyen, Anh Gia-Tuan; Nguyen, Ngan Luu-Thuy

Computer Science > Computation and Language

arXiv:2204.07002 (cs)

[Submitted on 14 Apr 2022 (v1), last revised 13 Aug 2022 (this version, v2)]

Title:XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Authors:Kiet Van Nguyen, Phong Nguyen-Thuan Do, Nhat Duy Nguyen, Tin Van Huynh, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen

View PDF

Abstract:Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.

Comments:	Accepted by ACIIDS 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2204.07002 [cs.CL]
	(or arXiv:2204.07002v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.07002

Submission history

From: Kiet Nguyen [view email]
[v1] Thu, 14 Apr 2022 14:54:33 UTC (915 KB)
[v2] Sat, 13 Aug 2022 10:07:47 UTC (915 KB)

Computer Science > Computation and Language

Title:XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators