VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Raza, Shaina; Vayani, Ashmal; Jain, Aditya; Narayanan, Aravind; Khazaie, Vahid Reza; Bashir, Syed Raza; Dolatabadi, Elham; Uddin, Gias; Emmanouilidis, Christos; Qureshi, Rizwan; Shah, Mubarak

Computer Science > Computation and Language

arXiv:2502.11361 (cs)

[Submitted on 17 Feb 2025 (v1), last revised 30 May 2025 (this version, v3)]

Title:VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Authors:Shaina Raza, Ashmal Vayani, Aditya Jain, Aravind Narayanan, Vahid Reza Khazaie, Syed Raza Bashir, Elham Dolatabadi, Gias Uddin, Christos Emmanouilidis, Rizwan Qureshi, Mubarak Shah

View PDF HTML (experimental)

Abstract:Detecting disinformation that blends manipulated text and images has become increasingly challenging, as AI tools make synthetic content easy to generate and disseminate. While most existing AI safety benchmarks focus on single modality misinformation (i.e., false content shared without intent to deceive), intentional multimodal disinformation, such as propaganda or conspiracy theories that imitate credible news, remains largely unaddressed. We introduce the Vision-Language Disinformation Detection Benchmark (VLDBench), the first large-scale resource supporting both unimodal (text-only) and multimodal (text + image) disinformation detection. VLDBench comprises approximately 62,000 labeled text-image pairs across 13 categories, curated from 58 news outlets. Using a semi-automated pipeline followed by expert review, 22 domain experts invested over 500 hours to produce high-quality annotations with substantial inter-annotator agreement. Evaluations of state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs) on VLDBench show that incorporating visual cues improves detection accuracy by 5 to 35 percentage points over text-only models. VLDBench provides data and code for evaluation, fine-tuning, and robustness testing to support disinformation analysis. Developed in alignment with AI governance frameworks (e.g., the MIT AI Risk Repository), VLDBench offers a principled foundation for advancing trustworthy disinformation detection in multimodal media.
Project: this https URL Dataset: this https URL Code: this https URL

Comments:	under review
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.11361 [cs.CL]
	(or arXiv:2502.11361v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.11361

Submission history

From: Shaina Raza Dr. [view email]
[v1] Mon, 17 Feb 2025 02:18:47 UTC (25,239 KB)
[v2] Sun, 23 Feb 2025 02:58:01 UTC (25,459 KB)
[v3] Fri, 30 May 2025 17:17:11 UTC (17,417 KB)

Computer Science > Computation and Language

Title:VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators