WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

Foroutan, Negar; Romanou, Angelika; Ansaripour, Matin; Eisenschlos, Julian Martin; Aberer, Karl; Lebret, Rémi

Computer Science > Computation and Language

arXiv:2506.15594 (cs)

[Submitted on 18 Jun 2025]

Title:WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

Authors:Negar Foroutan, Angelika Romanou, Matin Ansaripour, Julian Martin Eisenschlos, Karl Aberer, Rémi Lebret

View PDF HTML (experimental)

Abstract:Documents are fundamental to preserving and disseminating information, often incorporating complex layouts, tables, and charts that pose significant challenges for automatic document understanding (DU). While vision-language large models (VLLMs) have demonstrated improvements across various tasks, their effectiveness in processing long-context vision inputs remains unclear. This paper introduces WikiMixQA, a benchmark comprising 1,000 multiple-choice questions (MCQs) designed to evaluate cross-modal reasoning over tables and charts extracted from 4,000 Wikipedia pages spanning seven distinct topics. Unlike existing benchmarks, WikiMixQA emphasizes complex reasoning by requiring models to synthesize information from multiple modalities. We evaluate 12 state-of-the-art vision-language models, revealing that while proprietary models achieve ~70% accuracy when provided with direct context, their performance deteriorates significantly when retrieval from long documents is required. Among these, GPT-4-o is the only model exceeding 50% accuracy in this setting, whereas open-source models perform considerably worse, with a maximum accuracy of 27%. These findings underscore the challenges of long-context, multi-modal reasoning and establish WikiMixQA as a crucial benchmark for advancing document understanding research.

Comments:	ACL 2025 (Findings)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2506.15594 [cs.CL]
	(or arXiv:2506.15594v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2506.15594

Submission history

From: Negar Foroutan [view email]
[v1] Wed, 18 Jun 2025 16:09:18 UTC (5,583 KB)

Computer Science > Computation and Language

Title:WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators