Skip to main content

Showing 1–1 of 1 results for author: Flintoft, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.11774  [pdf, ps, other

    cs.LG cs.AI

    HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class

    Authors: James V. Roggeveen, Erik Y. Wang, Will Flintoft, Peter Donets, Lucy S. Nathwani, Nickholas Gutierrez, David Ettel, Anton Marius Graf, Siddharth Dandavate, Arjun Nageswaran, Raglan Ward, Ava Williamson, Anne Mykland, Kacper K. Migacz, Yijun Wang, Egemen Bostan, Duy Thuc Nguyen, Zhe He, Marc L. Descoteaux, Felix Yeung, Shida Liu, Jorge GarcĂ­a Ponce, Luke Zhu, Yuyang Chen, Ekaterina S. Ivshina , et al. (20 additional authors not shown)

    Abstract: Large language models (LLMs) have shown remarkable progress in mathematical problem-solving, but evaluation has largely focused on problems that have exact analytical solutions or involve formal proofs, often overlooking approximation-based problems ubiquitous in applied science and engineering. To fill this gap, we build on prior work and present HARDMath2, a dataset of 211 original problems cove… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.