-
Centaur: a foundation model of human cognition
Authors:
Marcel Binz,
Elif Akata,
Matthias Bethge,
Franziska Brändle,
Fred Callaway,
Julian Coda-Forno,
Peter Dayan,
Can Demircan,
Maria K. Eckstein,
Noémi Éltető,
Thomas L. Griffiths,
Susanne Haridi,
Akshay K. Jagadish,
Li Ji-An,
Alexander Kipnis,
Sreejan Kumar,
Tobias Ludwig,
Marvin Mathony,
Marcelo Mattar,
Alireza Modirshanechi,
Surabhi S. Nath,
Joshua C. Peterson,
Milena Rmus,
Evan M. Russek,
Tankred Saanum
, et al. (15 additional authors not shown)
Abstract:
Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centa…
▽ More
Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, our results demonstrate that it is possible to discover computational models that capture human behavior across a wide range of domains. We believe that such models provide tremendous potential for guiding the development of cognitive theories and present a case study to demonstrate this.
△ Less
Submitted 28 April, 2025; v1 submitted 26 October, 2024;
originally announced October 2024.
-
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Authors:
Can Demircan,
Tankred Saanum,
Akshay K. Jagadish,
Marcel Binz,
Eric Schulz
Abstract:
In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs). However, as LLMs' in-context learning abilities continue to improve, understanding this phenomenon mechanistically becomes increasingly important. In particular, it is not well-understood how LLMs learn to solve specific classes of problems, such as reinfo…
▽ More
In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs). However, as LLMs' in-context learning abilities continue to improve, understanding this phenomenon mechanistically becomes increasingly important. In particular, it is not well-understood how LLMs learn to solve specific classes of problems, such as reinforcement learning (RL) problems, in-context. Through three different tasks, we first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors. Notably, these representations emerge despite the model only being trained to predict the next token. We verify that these representations are indeed causally involved in the computation of TD errors and $Q$-values by performing carefully designed interventions on them. Taken together, our work establishes a methodology for studying and manipulating in-context learning with SAEs, paving the way for a more mechanistic understanding.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Evaluating alignment between humans and neural network representations in image-based learning tasks
Authors:
Can Demircan,
Tankred Saanum,
Leonardo Pettini,
Marcel Binz,
Blazej M Baczkowski,
Christian F Doeller,
Mona M Garvert,
Eric Schulz
Abstract:
Humans represent scenes and objects in rich feature spaces, carrying information that allows us to generalise about category memberships and abstract functions with few examples. What determines whether a neural network model generalises like a human? We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories across two tasks where humans…
▽ More
Humans represent scenes and objects in rich feature spaces, carrying information that allows us to generalise about category memberships and abstract functions with few examples. What determines whether a neural network model generalises like a human? We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories across two tasks where humans had to learn continuous relationships and categories of natural images. In these tasks, both human participants and neural networks successfully identified the relevant stimulus features within a few trials, demonstrating effective generalisation. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. Intrinsic dimensionality of representations had different effects on alignment for different model types. Lastly, we tested three sets of human-aligned representations and found no consistent improvements in predictive accuracy compared to the baselines. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks. Both our paradigms and modelling approach offer a novel way to quantify alignment between neural networks and humans and extend cognitive science into more naturalistic domains.
△ Less
Submitted 16 January, 2025; v1 submitted 15 June, 2023;
originally announced June 2023.