Showing 1–1 of 1 results for author: Rivera, M U

Search v0.5.6 released 2020-02-24

arXiv:2505.17345 [pdf, ps, other]

cs.CL

Language models should be subject to repeatable, open, domain-contextualized hallucination benchmarking

Authors: Justin D. Norman, Michael U. Rivera, D. Alex Hughes

Abstract: Plausible, but inaccurate, tokens in model-generated text are widely believed to be pervasive and problematic for the responsible adoption of language models. Despite this concern, there is little scientific work that attempts to measure the prevalence of language model hallucination in a comprehensive way. In this paper, we argue that language models should be evaluated using repeatable, open, an… ▽ More Plausible, but inaccurate, tokens in model-generated text are widely believed to be pervasive and problematic for the responsible adoption of language models. Despite this concern, there is little scientific work that attempts to measure the prevalence of language model hallucination in a comprehensive way. In this paper, we argue that language models should be evaluated using repeatable, open, and domain-contextualized hallucination benchmarking. We present a taxonomy of hallucinations alongside a case study that demonstrates that when experts are absent from the early stages of data creation, the resulting hallucination metrics lack validity and practical utility. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 9 pages

Search v0.5.6 released 2020-02-24