GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Hsu, Ting-Yao; Huang, Chieh-Yang; Rossi, Ryan; Kim, Sungchul; Giles, C. Lee; Huang, Ting-Hao K.

Computer Science > Computation and Language

arXiv:2310.15405 (cs)

[Submitted on 23 Oct 2023]

Title:GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Authors:Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang

View PDF

Abstract:There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings

Comments:	To Appear in EMNLP 2023 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.15405 [cs.CL]
	(or arXiv:2310.15405v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.15405

Submission history

From: Ting-Yao Hsu [view email]
[v1] Mon, 23 Oct 2023 23:24:57 UTC (9,399 KB)

Computer Science > Computation and Language

Title:GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators