Skip to main content

Showing 1–1 of 1 results for author: Coxon, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13776  [pdf, ps, other

    cs.AI cs.CY cs.HC

    Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations

    Authors: Kevin L. Wei, Patricia Paskov, Sunishchal Dev, Michael J. Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, Chinmay Deshpande

    Abstract: In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluatio… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: A version of this paper has been accepted to ICML 2025 as a position paper (spotlight), with the title: "Position: Human Baselines in Model Evaluations Need Rigor and Transparency (With Recommendations & Reporting Checklist)."