Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Tran, Son Quoc; Do, Phong Nguyen-Thuan; Van Nguyen, Kiet; Nguyen, Ngan Luu-Thuy

Computer Science > Computation and Language

arXiv:2303.13355 (cs)

[Submitted on 16 Mar 2023]

Title:Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Authors:Son Quoc Tran, Phong Nguyen-Thuan Do, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

View PDF

Abstract:Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.

Comments:	Accepted at The 2023 EACL Student Research Workshop
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.13355 [cs.CL]
	(or arXiv:2303.13355v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.13355

Submission history

From: Son Tran [view email]
[v1] Thu, 16 Mar 2023 20:32:58 UTC (7,159 KB)

Computer Science > Computation and Language

Title:Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators