Skip to main content

Showing 1–2 of 2 results for author: Kellermann, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.17332  [pdf, other

    cs.CR cs.AI

    CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

    Authors: Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, Daniel Kang

    Abstract: Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks, posing significant threats to existing applications. This growing risk highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities. However, existing benchmarks fall short as they are limited to abstracted Capture the Flag co… ▽ More

    Submitted 10 April, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

    Comments: 15 pages, 4 figures, 5 tables

    ACM Class: I.2.1; I.2.7

  2. arXiv:2406.01637  [pdf, other

    cs.MA cs.AI

    Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

    Authors: Yuxuan Zhu, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, Daniel Kang

    Abstract: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). I… ▽ More

    Submitted 29 March, 2025; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures

    ACM Class: I.2.7; D.4.6