Skip to main content

Showing 1–1 of 1 results for author: Ihli, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.17332  [pdf, other

    cs.CR cs.AI

    CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

    Authors: Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, Daniel Kang

    Abstract: Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks, posing significant threats to existing applications. This growing risk highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities. However, existing benchmarks fall short as they are limited to abstracted Capture the Flag co… ▽ More

    Submitted 10 April, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

    Comments: 15 pages, 4 figures, 5 tables

    ACM Class: I.2.1; I.2.7