Skip to main content

Showing 1–1 of 1 results for author: Woffinden-Luey, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.10566  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Eureka: Evaluating and Understanding Large Foundation Models

    Authors: Vidhisha Balachandran, Jingya Chen, Neel Joshi, Besmira Nushi, Hamid Palangi, Eduardo Salinas, Vibhav Vineet, James Woffinden-Luey, Safoora Yousefi

    Abstract: Rigorous and reproducible evaluation is critical for assessing the state of the art and for guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due to several reasons, including benchmark saturation, lack of transparency in methods used for measurement, development challenges in extracting measurements for generative tasks, and, more generally, the extensi… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    ACM Class: I.2