MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Murugesan, Keerthiram; Swaminathan, Sarathkrishna; Dan, Soham; Chaudhury, Subhajit; Gunasekara, Chulaka; Crouse, Maxwell; Mahajan, Diwakar; Abdelaziz, Ibrahim; Fokoue, Achille; Kapanipathi, Pavan; Roukos, Salim; Gray, Alexander

Computer Science > Computation and Language

arXiv:2306.10452 (cs)

[Submitted on 18 Jun 2023]

Title:MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Authors:Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

View PDF

Abstract:With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.

Comments:	Accepted at ACL 2023 (ACL Findings Long)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.10452 [cs.CL]
	(or arXiv:2306.10452v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.10452

Submission history

From: Keerthiram Murugesan [view email]
[v1] Sun, 18 Jun 2023 01:38:53 UTC (3,877 KB)

Computer Science > Computation and Language

Title:MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators