Skip to main content

Showing 1–1 of 1 results for author: Turkkan, B O

.
  1. arXiv:2502.05352  [pdf, other

    cs.AI cs.DC cs.MA

    ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

    Authors: Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Kitahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir O. Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatterjee, Pranjal Gupta, Suranjana Samanta, Pooja Aggarwal, Rong Lee, Pavankumar Murali , et al. (18 additional authors not shown)

    Abstract: Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Securit… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.