Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Norman, Justin D.; Rivera, Michael U.; Hughes, D. Alex

Computer Science > Computation and Language

arXiv:2505.17345 (cs)

[Submitted on 22 May 2025]

Title:Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Authors:Justin D. Norman, Michael U. Rivera, D. Alex Hughes

View PDF HTML (experimental)

Abstract:Plausible, but inaccurate, tokens in model-generated text are widely believed to be pervasive and problematic for the responsible adoption of language models. Despite this concern, there is little scientific work that attempts to measure the prevalence of language model hallucination in a comprehensive way. In this paper, we argue that language models should be evaluated using repeatable, open, and domain-contextualized hallucination benchmarking. We present a taxonomy of hallucinations alongside a case study that demonstrates that when experts are absent from the early stages of data creation, the resulting hallucination metrics lack validity and practical utility.

Comments:	9 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2505.17345 [cs.CL]
	(or arXiv:2505.17345v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.17345

Submission history

From: D. Alex Hughes [view email]
[v1] Thu, 22 May 2025 23:36:28 UTC (40 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2025-05

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators