Reasoning Models Are More Easily Gaslighted Than You Think

Zhu, Bin; Yin, Hailong; Chen, Jingjing; Jiang, Yu-Gang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2506.09677 (cs)

[Submitted on 11 Jun 2025]

Title:Reasoning Models Are More Easily Gaslighted Than You Think

Authors:Bin Zhu, Hailong Yin, Jingjing Chen, Yu-Gang Jiang

View PDF HTML (experimental)

Abstract:Recent advances in reasoning-centric models promise improved robustness through mechanisms such as chain-of-thought prompting and test-time scaling. However, their ability to withstand misleading user input remains underexplored. In this paper, we conduct a systematic evaluation of three state-of-the-art reasoning models, i.e., OpenAI's o4-mini, Claude-3.7-Sonnet and Gemini-2.5-Flash, across three multimodal benchmarks: MMMU, MathVista, and CharXiv. Our evaluation reveals significant accuracy drops (25-29% on average) following gaslighting negation prompts, indicating that even top-tier reasoning models struggle to preserve correct answers under manipulative user feedback. Built upon the insights of the evaluation and to further probe this vulnerability, we introduce GaslightingBench-R, a new diagnostic benchmark specifically designed to evaluate reasoning models' susceptibility to defend their belief under gaslighting negation prompt. Constructed by filtering and curating 1,025 challenging samples from the existing benchmarks, GaslightingBench-R induces even more dramatic failures, with accuracy drops exceeding 53% on average. Our findings reveal fundamental limitations in the robustness of reasoning models, highlighting the gap between step-by-step reasoning and belief persistence.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.09677 [cs.CV]
	(or arXiv:2506.09677v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2506.09677

Submission history

From: Hailong Yin [view email]
[v1] Wed, 11 Jun 2025 12:52:25 UTC (6,076 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Reasoning Models Are More Easily Gaslighted Than You Think

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Reasoning Models Are More Easily Gaslighted Than You Think

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators