Skip to main content

Showing 1–1 of 1 results for author: Vahedi, M M

.
  1. arXiv:2506.22146  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs

    Authors: Amirmohammad Izadi, Mohammad Ali Banayeeanzade, Fatemeh Askari, Ali Rahimiakbar, Mohammad Mahdi Vahedi, Hosein Hasani, Mahdieh Soleymani Baghshah

    Abstract: Despite progress in Vision-Language Models (VLMs), their capacity for visual reasoning is often limited by the \textit{binding problem}: the failure to reliably associate perceptual features with their correct visual referents. This limitation underlies persistent errors in tasks such as counting, visual search, scene description, and spatial relationship understanding. A key factor is that curren… ▽ More

    Submitted 2 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.