Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Moore, Steven; Nguyen, Huy A.; Chen, Tianying; Stamper, John

Computer Science > Computation and Language

arXiv:2307.08161 (cs)

[Submitted on 16 Jul 2023]

Title:Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Authors:Steven Moore, Huy A. Nguyen, Tianying Chen, John Stamper

View PDF

Abstract:Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.

Comments:	Accepted as a Research Paper in 18th European Conference on Technology Enhanced Learning
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2307.08161 [cs.CL]
	(or arXiv:2307.08161v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2307.08161

Submission history

From: Steven Moore [view email]
[v1] Sun, 16 Jul 2023 22:12:10 UTC (312 KB)

Computer Science > Computation and Language

Title:Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators