BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

Ye, Junyan; Jiang, Dongzhi; He, Jun; Zhou, Baichuan; Huang, Zilong; Yan, Zhiyuan; Li, Hongsheng; He, Conghui; Li, Weijia

Computer Science > Computer Vision and Pattern Recognition

arXiv:2510.09361 (cs)

[Submitted on 10 Oct 2025]

Title:BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

Authors:Junyan Ye, Dongzhi Jiang, Jun He, Baichuan Zhou, Zilong Huang, Zhiyuan Yan, Hongsheng Li, Conghui He, Weijia Li

View PDF HTML (experimental)

Abstract:Recently, Multimodal Large Language Models (MLLMs) have made rapid progress, particularly in enhancing their reasoning capabilities. However, existing reasoning benchmarks still primarily assess language-based reasoning, often treating visual input as replaceable context. To address this gap, we introduce BLINK-Twice, a vision-centric reasoning benchmark grounded in challenging perceptual tasks. Instead of relying on external knowledge, our tasks require models to reason from visual content alone, shifting the focus from language-based to image-grounded reasoning. Compared to prior perception benchmarks, it moves beyond shallow perception ("see") and requires fine-grained observation and analytical reasoning ("observe"). BLINK-Twice integrates three core components: seven types of visual challenges for testing visual reasoning, natural adversarial image pairs that enforce reliance on visual content, and annotated reasoning chains for fine-grained evaluation of the reasoning process rather than final answers alone. We evaluate 20 leading MLLMs, including 12 foundation models and 8 reasoning-enhanced models. BLINK-Twice poses a significant challenge to current models. While existing reasoning strategies in the language space-such as chain-of-thought or self-criticism can improve performance, they often result in unstable and redundant reasoning. We observe that repeated image observation improves performance across models, and active visual interaction, as demonstrated by models like o3, highlights the need for a new paradigm for vision reasoning. The dataset is publicly available at this https URL

Comments:	Accepted to 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Track on Datasets and Benchmarks
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2510.09361 [cs.CV]
	(or arXiv:2510.09361v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2510.09361

Submission history

From: Weijia Li [view email]
[v1] Fri, 10 Oct 2025 13:14:13 UTC (6,550 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:BLINK-Twice: You see, but do you observe? A Reasoning Benchmark on Visual Perception

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators