-
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
Authors:
Ahmed Masry,
Mohammed Saidul Islam,
Mahir Ahmed,
Aayush Bajaj,
Firoz Kabir,
Aaryaman Kartha,
Md Tahmid Rahman Laskar,
Mizanur Rahman,
Shadikur Rahman,
Mehrad Shahmohammadi,
Megh Thakkar,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like…
▽ More
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like ChartQA lack real-world diversity and have recently shown performance saturation with modern large vision-language models (LVLMs). To address these limitations, we introduce ChartQAPro, a new benchmark that includes 1,341 charts from 157 diverse sources, spanning various chart types, including infographics and dashboards, and featuring 1,948 questions in various types, such as multiple-choice, conversational, hypothetical, and unanswerable questions, to better reflect real-world challenges. Our evaluations with 21 models show a substantial performance drop for LVLMs on ChartQAPro; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on ChartQAPro, underscoring the complexity of chart reasoning. We complement our findings with detailed error analyses and ablation studies, identifying key challenges and opportunities for advancing LVLMs in chart understanding and reasoning. We release ChartQAPro at https://github.com/vis-nlp/ChartQAPro.
△ Less
Submitted 10 April, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
Authors:
Ahmed Masry,
Mehrad Shahmohammadi,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific mo…
▽ More
Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific models are not capable of solving a wide range of chart-related tasks, constraining their real-world applicability. To overcome these challenges, we introduce ChartInstruct: a novel chart-specific vision-language Instruction-following dataset comprising 191K instructions generated with 71K charts. We then present two distinct systems for instruction tuning on such datasets: (1) an end-to-end model that connects a vision encoder for chart understanding with a LLM; and (2) a pipeline model that employs a two-step approach to extract chart data tables and input them into the LLM. In experiments on four downstream tasks, we first show the effectiveness of our model--achieving a new set of state-of-the-art results. Further evaluation shows that our instruction-tuning approach supports a wide array of real-world chart comprehension and reasoning scenarios, thereby expanding the scope and applicability of our models to new kinds of tasks.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Fingerprinting with Minimum Distance Decoding
Authors:
Shih-Chun Lin,
Mohammad Shahmohammadi,
Hesham El Gamal
Abstract:
This work adopts an information theoretic framework for the design of collusion-resistant coding/decoding schemes for digital fingerprinting. More specifically, the minimum distance decision rule is used to identify 1 out of t pirates. Achievable rates, under this detection rule, are characterized in two distinct scenarios. First, we consider the averaging attack where a random coding argument i…
▽ More
This work adopts an information theoretic framework for the design of collusion-resistant coding/decoding schemes for digital fingerprinting. More specifically, the minimum distance decision rule is used to identify 1 out of t pirates. Achievable rates, under this detection rule, are characterized in two distinct scenarios. First, we consider the averaging attack where a random coding argument is used to show that the rate 1/2 is achievable with t=2 pirates. Our study is then extended to the general case of arbitrary $t$ highlighting the underlying complexity-performance tradeoff. Overall, these results establish the significant performance gains offered by minimum distance decoding as compared to other approaches based on orthogonal codes and correlation detectors. In the second scenario, we characterize the achievable rates, with minimum distance decoding, under any collusion attack that satisfies the marking assumption. For t=2 pirates, we show that the rate $1-H(0.25)\approx 0.188$ is achievable using an ensemble of random linear codes. For $t\geq 3$, the existence of a non-resolvable collusion attack, with minimum distance decoding, for any non-zero rate is established. Inspired by our theoretical analysis, we then construct coding/decoding schemes for fingerprinting based on the celebrated Belief-Propagation framework. Using an explicit repeat-accumulate code, we obtain a vanishingly small probability of misidentification at rate 1/3 under averaging attack with t=2. For collusion attacks which satisfy the marking assumption, we use a more sophisticated accumulate repeat accumulate code to obtain a vanishingly small misidentification probability at rate 1/9 with t=2. These results represent a marked improvement over the best available designs in the literature.
△ Less
Submitted 14 October, 2007;
originally announced October 2007.