NegVQA: Can Vision Language Models Understand Negation?

Zhang, Yuhui; Su, Yuchang; Liu, Yiming; Yeung-Levy, Serena

Computer Science > Computation and Language

arXiv:2505.22946 (cs)

[Submitted on 28 May 2025]

Title:NegVQA: Can Vision Language Models Understand Negation?

Authors:Yuhui Zhang, Yuchang Su, Yiming Liu, Serena Yeung-Levy

View PDF HTML (experimental)

Abstract:Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially degrades performance on NegVQA before leading to improvements. Our benchmark reveals critical gaps in VLMs' negation understanding and offers insights into future VLM development. Project page available at this https URL.

Comments:	Published at ACL 2025 Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2505.22946 [cs.CL]
	(or arXiv:2505.22946v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.22946

Submission history

From: Yuhui Zhang [view email]
[v1] Wed, 28 May 2025 23:58:37 UTC (5,763 KB)

Computer Science > Computation and Language

Title:NegVQA: Can Vision Language Models Understand Negation?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NegVQA: Can Vision Language Models Understand Negation?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators