CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Zhang, Yiming; Yu, Chengzhang; Zhao, Zhuokai; Wang, Kun; Li, Qiankun; Chen, Zihan; Liu, Yang; Ding, Zenghui; Sun, Yining

Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.19420 (cs)

[Submitted on 25 Jul 2025]

Title:CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Authors:Yiming Zhang, Chengzhang Yu, Zhuokai Zhao, Kun Wang, Qiankun Li, Zihan Chen, Yang Liu, Zenghui Ding, Yining Sun

View PDF

Abstract:The processing mechanisms underlying language and image understanding in large vision-language models (LVLMs) have been extensively studied. However, the internal reasoning mechanisms of LVLMs for spatiotemporal understanding remain poorly understood. In this work, we introduce a systematic, circuit-based framework designed to investigate how spatiotemporal visual semantics are represented and processed within these LVLMs. Specifically, our framework comprises three circuits: visual auditing circuit, semantic tracing circuit, and attention flow circuit. Through the lens of these circuits, we discover that visual semantics are highly localized to specific object tokens--removing these tokens can degrade model performance by up to 92.6%. Furthermore, we identify that interpretable concepts of objects and actions emerge and become progressively refined in the middle-to-late layers of LVLMs. In contrary to the current works that solely focus on objects in one image, we reveal that the middle-to-late layers of LVLMs exhibit specialized functional localization for spatiotemporal semantics. Our findings offer significant mechanistic insights into spatiotemporal semantics analysis of LVLMs, laying a foundation for designing more robust and interpretable models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2507.19420 [cs.CV]
	(or arXiv:2507.19420v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.19420

Submission history

From: Yiming Zhang [view email]
[v1] Fri, 25 Jul 2025 16:38:18 UTC (4,620 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CircuitProbe: Dissecting Spatiotemporal Visual Semantics with Circuit Tracing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators