Search | arXiv e-print repository

Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem

Authors: Declan Campbell, Sunayana Rane, Tyler Giallanza, Nicolò De Sabbata, Kia Ghods, Amogh Joshi, Alexander Ku, Steven M. Frankland, Thomas L. Griffiths, Jonathan D. Cohen, Taylor W. Webb

Abstract: Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and si… ▽ More Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and simple forms of visual analogy -- that humans perform with near perfect accuracy. To better understand this puzzling pattern of successes and failures, we turn to theoretical accounts of the binding problem in cognitive science and neuroscience, a fundamental problem that arises when a shared set of representational resources must be used to represent distinct entities (e.g., to represent multiple objects in an image), necessitating the use of serial processing to avoid interference. We find that many of the puzzling failures of state-of-the-art VLMs can be explained as arising due to the binding problem, and that these failure modes are strikingly similar to the limitations exhibited by rapid, feedforward processing in the human brain. △ Less

Submitted 16 April, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

arXiv:2405.19420 [pdf, other]

Learning Human-Aligned Representations with Contrastive Learning and Generative Similarity

Authors: Raja Marjieh, Sreejan Kumar, Declan Campbell, Liyi Zhang, Gianluca Bencomo, Jake Snell, Thomas L. Griffiths

Abstract: Humans rely on effective representations to learn from few examples and abstract useful information from sensory data. Inducing such representations in machine learning models has been shown to improve their performance on various benchmarks such as few-shot learning and robustness. However, finding effective training procedures to achieve that goal can be challenging as psychologically rich train… ▽ More Humans rely on effective representations to learn from few examples and abstract useful information from sensory data. Inducing such representations in machine learning models has been shown to improve their performance on various benchmarks such as few-shot learning and robustness. However, finding effective training procedures to achieve that goal can be challenging as psychologically rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by leveraging a Bayesian notion of generative similarity whereby two data points are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We incorporate generative similarity into a contrastive learning objective to enable learning of embeddings that express human cognitive representations. We demonstrate the utility of our approach by showing that it can be used to capture human-like representations of shape regularity, abstract Euclidean geometric concepts, and semantic hierarchies for natural images. △ Less

Submitted 31 January, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2402.04203 [pdf, other]

Human-Like Geometric Abstraction in Large Pre-trained Neural Networks

Authors: Declan Campbell, Sreejan Kumar, Tyler Giallanza, Thomas L. Griffiths, Jonathan D. Cohen

Abstract: Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) sugges… ▽ More Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) suggests that neural networks begin to demonstrate more human-like reasoning after scaling up standard architectures in both model size and amount of training data. In this study, we revisit empirical results in cognitive science on geometric visual processing and identify three key biases in geometric visual processing: a sensitivity towards complexity, regularity, and the perception of parts and relations. We test tasks from the literature that probe these biases in humans and find that large pre-trained neural network models used in AI demonstrate more human-like abstract geometric processing. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03618 [pdf, other]

Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction

Authors: Sreejan Kumar, Raja Marjieh, Byron Zhang, Declan Campbell, Michael Y. Hu, Umang Bhatt, Brenden Lake, Thomas L. Griffiths

Abstract: Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often commu… ▽ More Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2309.17363 [pdf, other]

Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity

Authors: Declan Campbell, Sreejan Kumar, Tyler Giallanza, Jonathan D. Cohen, Thomas L. Griffiths

Abstract: Uniquely among primates, humans possess a remarkable capacity to recognize and manipulate abstract structure in the service of task goals across a broad range of behaviors. One illustration of this is in the visual perception of geometric forms. Studies have shown a uniquely human bias toward geometric regularity, with task performance enhanced for more regular and symmetric forms compared to thei… ▽ More Uniquely among primates, humans possess a remarkable capacity to recognize and manipulate abstract structure in the service of task goals across a broad range of behaviors. One illustration of this is in the visual perception of geometric forms. Studies have shown a uniquely human bias toward geometric regularity, with task performance enhanced for more regular and symmetric forms compared to their geometrically irregular counterparts. Such studies conclude that this behavior implies the existence of discrete symbolic structure in human mental representations, and that replicating such behavior in neural network architectures will require mechanisms for symbolic processing. In this study, we argue that human biases towards geometric regularity can be reproduced in neural networks, without explicitly providing them with symbolic machinery, by augmenting them with an architectural constraint that enables the system to discover and manipulate relational structure. When trained with the appropriate curriculum, this model exhibits human-like biases towards symmetry and regularity in two distinct tasks involving abstract geometric reasoning. Our findings indicate that neural networks, when equipped with the necessary training objectives and architectural elements, can exhibit human-like regularity biases and generalization. This approach provides insights into the neural mechanisms underlying geometric reasoning and offers an alternative to prevailing symbolic "Language of Thought" models in this domain. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2003.13221 [pdf, other]

doi 10.3389/frai.2021.550603

Planning as Inference in Epidemiological Models

Authors: Frank Wood, Andrew Warrington, Saeid Naderiparizi, Christian Weilbach, Vaden Masrani, William Harvey, Adam Scibior, Boyan Beronov, John Grefenstette, Duncan Campbell, Ali Nasseri

Abstract: In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among oth… ▽ More In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policymaking could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic. △ Less

Submitted 15 September, 2021; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: Revisions

Journal ref: Front Artif Intell. 2021; 4: 550603

arXiv:q-bio/0507015 [pdf, ps, other]

Differential gene expression in Bacillus subtilis

Authors: Dagmar Iber, Joanna Clarkson, Michael D Yudkin, Iain D Campbell

Abstract: Sporulation in Bacillus subtilis serves as a paradigm for the development of two different cell types (mother cell and prespore) from a single cell. The mechanism by which the two different developmental programs are initiated has been much studied but is not well understood. With the help of existing and new experimental results, a mathematical model has been developed that reproduces all publi… ▽ More Sporulation in Bacillus subtilis serves as a paradigm for the development of two different cell types (mother cell and prespore) from a single cell. The mechanism by which the two different developmental programs are initiated has been much studied but is not well understood. With the help of existing and new experimental results, a mathematical model has been developed that reproduces all published in vitro experiments and makes new predictions about the properties of the system in vivo. △ Less

Submitted 10 July, 2005; originally announced July 2005.

arXiv:q-bio/0507010 [pdf, ps, other]

doi 10.1007/s11538-005-9049-5

Integrin activation - the importance of a positive feedback

Authors: Dagmar Iber, Iain D Campbell

Abstract: Integrins mediate cell adhesion and are essential receptors for the development and functioning of multicellular organisms. Integrin activation is known to require both ligand and talin binding and to correlate with cluster formation but the activation mechanism and precise roles of these processes are not yet resolved. Here mathematical modeling, with known experimental parameters, is used to s… ▽ More Integrins mediate cell adhesion and are essential receptors for the development and functioning of multicellular organisms. Integrin activation is known to require both ligand and talin binding and to correlate with cluster formation but the activation mechanism and precise roles of these processes are not yet resolved. Here mathematical modeling, with known experimental parameters, is used to show that the binding of a stabilizing factor, such as talin, is alone insufficient to enable ligand-dependent integrin activation for all observed conditions; an additional positive feedback is required. △ Less

Submitted 31 March, 2006; v1 submitted 7 July, 2005; originally announced July 2005.

Comments: in press in Bulletin of Mathematical Biology

Showing 1–8 of 8 results for author: Campbell, D