-
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Authors:
Jacob Russin,
Sam Whitman McGrath,
Danielle J. Williams,
Lotem Elber-Dorozko
Abstract:
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to di…
▽ More
Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, large language models (LLMs), DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Multiple Realizability and the Rise of Deep Learning
Authors:
Sam Whitman McGrath,
Jacob Russin
Abstract:
The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability the…
▽ More
The multiple realizability thesis holds that psychological states may be implemented in a diversity of physical systems. The deep learning revolution seems to be bringing this possibility to life, offering the most plausible examples of man-made realizations of sophisticated cognitive functions to date. This paper explores the implications of deep learning models for the multiple realizability thesis. Among other things, it challenges the widely held view that multiple realizability entails that the study of the mind can and must be pursued independently of the study of its implementation in the brain or in artificial analogues. Although its central contribution is philosophical, the paper has substantial methodological upshots for contemporary cognitive science, suggesting that deep neural networks may play a crucial role in formulating and evaluating hypotheses about cognition, even if they are interpreted as implementation-level models. In the age of deep learning, multiple realizability possesses a renewed significance.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
The dynamic interplay between in-context and in-weight learning in humans and neural networks
Authors:
Jacob Russin,
Ellie Pavlick,
Michael J. Frank
Abstract:
Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evi…
▽ More
Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems -- one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that metalearning neural networks and large language models are capable of "in-context learning" (ICL) -- the ability to flexibly grasp the structure of a new task from a few examples. Here, we show that the dynamic interplay between ICL and default in-weight learning (IWL) naturally captures a broad range of learning phenomena observed in humans, reproducing curriculum effects on category-learning and compositional tasks, and recapitulating a tradeoff between flexibility and retention. Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties that can coexist with their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.
△ Less
Submitted 25 April, 2025; v1 submitted 13 February, 2024;
originally announced February 2024.
-
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
Authors:
Taylor W. Webb,
Steven M. Frankland,
Awni Altabaa,
Simon Segert,
Kamesh Krishnamurthy,
Declan Campbell,
Jacob Russin,
Tyler Giallanza,
Zack Dulberg,
Randall O'Reilly,
John Lafferty,
Jonathan D. Cohen
Abstract:
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck…
▽ More
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. In that approach, neural networks are constrained via their architecture to focus on relations between perceptual inputs, rather than the attributes of individual inputs. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
△ Less
Submitted 1 May, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
A Neural Network Model of Continual Learning with Cognitive Control
Authors:
Jacob Russin,
Maryam Zolfaghar,
Seongmin A. Park,
Erie Boorman,
Randall C. O'Reilly
Abstract:
Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks e…
▽ More
Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks equipped with a mechanism for cognitive control do not exhibit catastrophic forgetting when trials are blocked. We further show an advantage of blocking over interleaving when there is a bias for active maintenance in the control signal, implying a tradeoff between maintenance and the strength of control. Analyses of map-like representations learned by the networks provided additional insights into these mechanisms. Our work highlights the potential of cognitive control to aid continual learning in neural networks, and offers an explanation for the advantage of blocking that has been observed in humans.
△ Less
Submitted 3 November, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Compositional Processing Emerges in Neural Networks Solving Math Problems
Authors:
Jacob Russin,
Roland Fernandez,
Hamid Palangi,
Eric Rosen,
Nebojsa Jojic,
Paul Smolensky,
Jianfeng Gao
Abstract:
A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has…
▽ More
A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Complementary Structure-Learning Neural Networks for Relational Reasoning
Authors:
Jacob Russin,
Maryam Zolfaghar,
Seongmin A. Park,
Erie Boorman,
Randall C. O'Reilly
Abstract:
The neural mechanisms supporting flexible relational inferences, especially in novel situations, are a major focus of current research. In the complementary learning systems framework, pattern separation in the hippocampus allows rapid learning in novel environments, while slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments.…
▽ More
The neural mechanisms supporting flexible relational inferences, especially in novel situations, are a major focus of current research. In the complementary learning systems framework, pattern separation in the hippocampus allows rapid learning in novel environments, while slower learning in neocortex accumulates small weight changes to extract systematic structure from well-learned environments. In this work, we adapt this framework to a task from a recent fMRI experiment where novel transitive inferences must be made according to implicit relational structure. We show that computational models capturing the basic cognitive properties of these two systems can explain relational transitive inferences in both familiar and novel environments, and reproduce key phenomena observed in the fMRI experiment.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Compositional generalization in a deep seq2seq model by separating syntax and semantics
Authors:
Jake Russin,
Jason Jo,
Randall C. O'Reilly,
Yoshua Bengio
Abstract:
Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntacti…
▽ More
Standard methods in deep learning for natural language processing fail to capture the compositional structure of human language that allows for systematic generalization outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. Inspired by work in neuroscience suggesting separate brain systems for syntactic and semantic processing, we implement a modification to standard approaches in neural machine translation, imposing an analogous separation. The novel model, which we call Syntactic Attention, substantially outperforms standard methods in deep learning on the SCAN dataset, a compositional generalization task, without any hand-engineered features or additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure.
△ Less
Submitted 23 May, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.