ConDABench: Interactive Evaluation of Language Models for Data Analysis

Dutta, Avik; Gupta, Priyanshu; Hasanbeig, Hosein; Singh, Rahul Pratap; Nigam, Harshit; Gulwani, Sumit; Radhakrishna, Arjun; Soares, Gustavo; Tiwari, Ashish

Computer Science > Computation and Language

arXiv:2510.13835 (cs)

[Submitted on 10 Oct 2025]

Title:ConDABench: Interactive Evaluation of Language Models for Data Analysis

Authors:Avik Dutta, Priyanshu Gupta, Hosein Hasanbeig, Rahul Pratap Singh, Harshit Nigam, Sumit Gulwani, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari

View PDF HTML (experimental)

Abstract:Real-world data analysis tasks often come with under-specified goals and unclean data. User interaction is necessary to understand and disambiguate a user's intent, and hence, essential to solving these complex tasks. Existing benchmarks for evaluating LLMs on data analysis tasks do not capture these complexities or provide first-class support for interactivity. We introduce ConDABench, a framework for generating conversational data analysis (ConDA) benchmarks and evaluating external tools on the generated benchmarks. \bench consists of (a) a multi-agent workflow for generating realistic benchmarks from articles describing insights gained from public datasets, (b) 1,420 ConDA problems generated using this workflow, and (c) an evaluation harness that, for the first time, makes it possible to systematically evaluate conversational data analysis tools on the generated ConDA problems. Evaluation of state-of-the-art LLMs on the benchmarks reveals that while the new generation of models are better at solving more instances, they are not necessarily better at solving tasks that require sustained, long-form engagement. ConDABench is an avenue for model builders to measure progress towards truly collaborative models that can complete complex interactive tasks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2510.13835 [cs.CL]
	(or arXiv:2510.13835v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2510.13835

Submission history

From: Hosein Hasanbeig [view email]
[v1] Fri, 10 Oct 2025 15:54:51 UTC (8,215 KB)

Computer Science > Computation and Language

Title:ConDABench: Interactive Evaluation of Language Models for Data Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ConDABench: Interactive Evaluation of Language Models for Data Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators