Skip to main content

Showing 1–1 of 1 results for author: Hnyk, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.06287  [pdf, other

    cs.AI

    Deep Research Bench: Evaluating AI Web Research Agents

    Authors: FutureSearch, :, Nikos I. Bosse, Jon Evans, Robert G. Gambee, Daniel Hnyk, Peter Mühlbacher, Lawrence Phillips, Dan Schwarz, Jack Wildman

    Abstract: Amongst the most common use cases of modern AI is LLM chat with web search enabled. However, no direct evaluations of the quality of web research agents exist that control for the continually-changing web. We introduce Deep Research Bench, consisting of 89 multi-step web research task instances of varying difficulty across 8 diverse task categories, with the answers carefully worked out by skilled… ▽ More

    Submitted 6 May, 2025; originally announced June 2025.