Skip to main content

Showing 1–1 of 1 results for author: Ghods, K

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:2411.00238  [pdf, other

    cs.AI cs.CV cs.LG q-bio.NC

    Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

    Authors: Declan Campbell, Sunayana Rane, Tyler Giallanza, Nicolò De Sabbata, Kia Ghods, Amogh Joshi, Alexander Ku, Steven M. Frankland, Thomas L. Griffiths, Jonathan D. Cohen, Taylor W. Webb

    Abstract: Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and si… ▽ More

    Submitted 16 April, 2025; v1 submitted 31 October, 2024; originally announced November 2024.