-
$μ$PC: Scaling Predictive Coding to 100+ Layer Networks
Authors:
Francesco Innocenti,
El Mehdi Achour,
Christopher L. Buckley
Abstract:
The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs)…
▽ More
The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024). Here, we show that 100+ layer PCNs can be trained reliably using a Depth-$μ$P parameterisation (Yang et al., 2023; Bordelon et al., 2023) which we call "$μ$PC". Through an extensive analysis of the scaling behaviour of PCNs, we reveal several pathologies that make standard PCNs difficult to train at large depths. We then show that, despite addressing only some of these instabilities, $μ$PC allows stable training of very deep (up to 128-layer) residual networks on simple classification tasks with competitive performance and little tuning compared to current benchmarks. Moreover, $μ$PC enables zero-shot transfer of both weight and activity learning rates across widths and depths. Our results have implications for other local algorithms and could be extended to convolutional and transformer architectures. Code for $μ$PC is made available as part of a JAX library for PCNs at https://github.com/thebuckleylab/jpc (Innocenti et al., 2024).
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
JPC: Flexible Inference for Predictive Coding Networks in JAX
Authors:
Francesco Innocenti,
Paul Kinghorn,
Will Yun-Farmbrough,
Miguel De Llanza Varona,
Ryan Singh,
Christopher L. Buckley
Abstract:
We introduce JPC, a JAX library for training neural networks with Predictive Coding. JPC provides a simple, fast and flexible interface to train a variety of PC networks (PCNs) including discriminative, generative and hybrid models. Unlike existing libraries, JPC leverages ordinary differential equation solvers to integrate the gradient flow inference dynamics of PCNs. We find that a second-order…
▽ More
We introduce JPC, a JAX library for training neural networks with Predictive Coding. JPC provides a simple, fast and flexible interface to train a variety of PC networks (PCNs) including discriminative, generative and hybrid models. Unlike existing libraries, JPC leverages ordinary differential equation solvers to integrate the gradient flow inference dynamics of PCNs. We find that a second-order solver achieves significantly faster runtimes compared to standard Euler integration, with comparable performance on a range of tasks and network depths. JPC also provides some theoretical tools that can be used to study PCNs. We hope that JPC will facilitate future research of PC. The code is available at https://github.com/thebuckleylab/jpc.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Variational Bayes Gaussian Splatting
Authors:
Toon Van de Maele,
Ozan Catal,
Alexander Tschantz,
Christopher L. Buckley,
Tim Verbelen
Abstract:
Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Baye…
▽ More
Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models
Authors:
Viet Dung Nguyen,
Zhizhuo Yang,
Christopher L. Buckley,
Alexander Ororbia
Abstract:
Although research has produced promising results demonstrating the utility of active inference (AIF) in Markov decision processes (MDPs), there is relatively less work that builds AIF models in the context of environments and problems that take the form of partially observable Markov decision processes (POMDPs). In POMDP scenarios, the agent must infer the unobserved environmental state from raw s…
▽ More
Although research has produced promising results demonstrating the utility of active inference (AIF) in Markov decision processes (MDPs), there is relatively less work that builds AIF models in the context of environments and problems that take the form of partially observable Markov decision processes (POMDPs). In POMDP scenarios, the agent must infer the unobserved environmental state from raw sensory observations, e.g., pixels in an image. Additionally, less work exists in examining the most difficult form of POMDP-centered control: continuous action space POMDPs under sparse reward signals. In this work, we address issues facing the AIF modeling paradigm by introducing novel prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments. Empirically, we show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate. The code in support of this work can be found at https://github.com/NACLab/robust-active-inference.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
Exploring Action-Centric Representations Through the Lens of Rate-Distortion Theory
Authors:
Miguel de Llanza Varona,
Christopher L. Buckley,
Beren Millidge
Abstract:
Organisms have to keep track of the information in the environment that is relevant for adaptive behaviour. Transmitting information in an economical and efficient way becomes crucial for limited-resourced agents living in high-dimensional environments. The efficient coding hypothesis claims that organisms seek to maximize the information about the sensory input in an efficient manner. Under Bayes…
▽ More
Organisms have to keep track of the information in the environment that is relevant for adaptive behaviour. Transmitting information in an economical and efficient way becomes crucial for limited-resourced agents living in high-dimensional environments. The efficient coding hypothesis claims that organisms seek to maximize the information about the sensory input in an efficient manner. Under Bayesian inference, this means that the role of the brain is to efficiently allocate resources in order to make predictions about the hidden states that cause sensory data. However, neither of those frameworks accounts for how that information is exploited downstream, leaving aside the action-oriented role of the perceptual system. Rate-distortion theory, which defines optimal lossy compression under constraints, has gained attention as a formal framework to explore goal-oriented efficient coding. In this work, we explore action-centric representations in the context of rate-distortion theory. We also provide a mathematical definition of abstractions and we argue that, as a summary of the relevant details, they can be used to fix the content of action-centric representations. We model action-centric representations using VAEs and we find that such representations i) are efficient lossy compressions of the data; ii) capture the task-dependent invariances necessary to achieve successful behaviour; and iii) are not in service of reconstructing the data. Thus, we conclude that full reconstruction of the data is rarely needed to achieve optimal behaviour, consistent with a teleological approach to perception.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Learning in Hybrid Active Inference Models
Authors:
Poppy Collis,
Ryan Singh,
Paul F Kinghorn,
Christopher L Buckley
Abstract:
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018)…
▽ More
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?
Authors:
Francesco Innocenti,
El Mehdi Achour,
Ryan Singh,
Christopher L. Buckley
Abstract:
Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. He…
▽ More
Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. Here, we study the geometry of the PC energy landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated energy is simply a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Our theory is validated by experiments on both linear and non-linear networks. Based on these and other results, we conjecture that all the saddles of the equilibrated energy are strict. Overall, this work suggests that PC inference makes the loss landscape more benign and robust to vanishing gradients, while also highlighting the fundamental challenge of scaling PC to deeper models.
△ Less
Submitted 8 November, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control
Authors:
Poppy Collis,
Ryan Singh,
Paul F Kinghorn,
Christopher L Buckley
Abstract:
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics…
▽ More
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). Furthermore, they model how the underlying continuous states drive these discrete mode switches. We propose that the rich representations formed by an rSLDS can provide useful abstractions for planning and control. We present a novel hierarchical model-based algorithm inspired by Active Inference in which a discrete MDP sits above a low-level linear-quadratic controller. The recurrent transition dynamics learned by the rSLDS allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We successfully apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and non-trivial planning through the delineation of abstract sub-goals.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Active Inference and Intentional Behaviour
Authors:
Karl J. Friston,
Tommaso Salvatori,
Takuya Isomura,
Alexander Tschantz,
Alex Kiefer,
Tim Verbelen,
Magnus Koudahl,
Aswin Paul,
Thomas Parr,
Adeel Razi,
Brett Kagan,
Christopher L. Buckley,
Maxwell J. D. Ramstead
Abstract:
Recent advances in theoretical biology suggest that basal cognition and sentient behaviour are emergent properties of in vitro cell cultures and neuronal networks, respectively. Such neuronal networks spontaneously learn structured behaviours in the absence of reward or reinforcement. In this paper, we characterise this kind of self-organisation through the lens of the free energy principle, i.e.,…
▽ More
Recent advances in theoretical biology suggest that basal cognition and sentient behaviour are emergent properties of in vitro cell cultures and neuronal networks, respectively. Such neuronal networks spontaneously learn structured behaviours in the absence of reward or reinforcement. In this paper, we characterise this kind of self-organisation through the lens of the free energy principle, i.e., as self-evidencing. We do this by first discussing the definitions of reactive and sentient behaviour in the setting of active inference, which describes the behaviour of agents that model the consequences of their actions. We then introduce a formal account of intentional behaviour, that describes agents as driven by a preferred endpoint or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behaviour using simulations. First, we simulate the aforementioned in vitro experiments, in which neuronal cultures spontaneously learn to play Pong, by implementing nested, free energy minimising processes. The simulations are then used to deconstruct the ensuing predictive behaviour, leading to the distinction between merely reactive, sentient, and intentional behaviour, with the latter formalised in terms of inductive planning. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem), that show how quickly and efficiently adaptive behaviour emerges under an inductive form of active inference.
△ Less
Submitted 16 December, 2023; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Supervised structure learning
Authors:
Karl J. Friston,
Lancelot Da Costa,
Alexander Tschantz,
Alex Kiefer,
Tommaso Salvatori,
Victorita Neacsu,
Magnus Koudahl,
Conor Heins,
Noor Sajid,
Dimitrije Markovic,
Thomas Parr,
Tim Verbelen,
Christopher L Buckley
Abstract:
This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move - in the ensuing schemes - is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces…
▽ More
This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move - in the ensuing schemes - is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states - and their characteristic paths or dynamics.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Understanding Tool Discovery and Tool Innovation Using Active Inference
Authors:
Poppy Collis,
Paul F Kinghorn,
Christopher L Buckley
Abstract:
The ability to invent new tools has been identified as an important facet of our ability as a species to problem solve in dynamic and novel environments. While the use of tools by artificial agents presents a challenging task and has been widely identified as a key goal in the field of autonomous robotics, far less research has tackled the invention of new tools by agents. In this paper, (1) we ar…
▽ More
The ability to invent new tools has been identified as an important facet of our ability as a species to problem solve in dynamic and novel environments. While the use of tools by artificial agents presents a challenging task and has been widely identified as a key goal in the field of autonomous robotics, far less research has tackled the invention of new tools by agents. In this paper, (1) we articulate the distinction between tool discovery and tool innovation by providing a minimal description of the two concepts under the formalism of active inference. We then (2) apply this description to construct a toy model of tool innovation by introducing the notion of tool affordances into the hidden states of the agent's probabilistic generative model. This particular state factorisation facilitates the ability to not just discover tools but invent them through the offline induction of an appropriate tool property. We discuss the implications of these preliminary results and outline future directions of research.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Relative representations for cognitive graphs
Authors:
Alex B. Kiefer,
Christopher L. Buckley
Abstract:
Although the latent spaces learned by distinct neural networks are not generally directly comparable, recent work in machine learning has shown that it is possible to use the similarities and differences among latent space vectors to derive "relative representations" with comparable representational power to their "absolute" counterparts, and which are nearly identical across models trained on sim…
▽ More
Although the latent spaces learned by distinct neural networks are not generally directly comparable, recent work in machine learning has shown that it is possible to use the similarities and differences among latent space vectors to derive "relative representations" with comparable representational power to their "absolute" counterparts, and which are nearly identical across models trained on similar data distributions. Apart from their intrinsic interest in revealing the underlying structure of learned latent spaces, relative representations are useful to compare representations across networks as a generic proxy for convergence, and for zero-shot model stitching.
In this work we examine an extension of relative representations to discrete state-space models, using Clone-Structured Cognitive Graphs (CSCGs) for 2D spatial localization and navigation as a test case. Our work shows that the probability vectors computed during message passing can be used to define relative representations on CSCGs, enabling effective communication across agents trained using different random initializations and training sequences, and on only partially similar spaces. We introduce a technique for zero-shot model stitching that can be applied post hoc, without the need for using relative representations during training. This exploratory work is intended as a proof-of-concept for the application of relative representations to the study of cognitive maps in neuroscience and AI.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
A Survey on Brain-Inspired Deep Learning via Predictive Coding
Authors:
Tommaso Salvatori,
Ankur Mali,
Christopher L. Buckley,
Thomas Lukasiewicz,
Rajesh P. N. Rao,
Karl Friston,
Alexander Ororbia
Abstract:
Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying unc…
▽ More
Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.
△ Less
Submitted 23 January, 2025; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Understanding Predictive Coding as an Adaptive Trust-Region Method
Authors:
Francesco Innocenti,
Ryan Singh,
Christopher L. Buckley
Abstract:
Predictive coding (PC) is a brain-inspired local learning algorithm that has recently been suggested to provide advantages over backpropagation (BP) in biologically relevant scenarios. While theoretical work has mainly focused on showing how PC can approximate BP in various limits, the putative benefits of "natural" PC are less understood. Here we develop a theory of PC as an adaptive trust-region…
▽ More
Predictive coding (PC) is a brain-inspired local learning algorithm that has recently been suggested to provide advantages over backpropagation (BP) in biologically relevant scenarios. While theoretical work has mainly focused on showing how PC can approximate BP in various limits, the putative benefits of "natural" PC are less understood. Here we develop a theory of PC as an adaptive trust-region (TR) algorithm that uses second-order information. We show that the learning dynamics of PC can be interpreted as interpolating between BP's loss gradient direction and a TR direction found by the PC inference dynamics. Our theory suggests that PC should escape saddle points faster than BP, a prediction which we prove in a shallow linear model and support with experiments on deeper networks. This work lays a foundation for understanding PC in deep and wide networks.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Attention: Marginal Probability is All You Need?
Authors:
Ryan Singh,
Christopher L. Buckley
Abstract:
Attention mechanisms are a central property of cognitive systems allowing them to selectively deploy cognitive resources in a flexible manner. Attention has been long studied in the neurosciences and there are numerous phenomenological models that try to capture its core properties. Recently attentional mechanisms have become a dominating architectural choice of machine learning and are the centra…
▽ More
Attention mechanisms are a central property of cognitive systems allowing them to selectively deploy cognitive resources in a flexible manner. Attention has been long studied in the neurosciences and there are numerous phenomenological models that try to capture its core properties. Recently attentional mechanisms have become a dominating architectural choice of machine learning and are the central innovation of Transformers. The dominant intuition and formalism underlying their development has drawn on ideas of keys and queries in database management systems. In this work, we propose an alternative Bayesian foundation for attentional mechanisms and show how this unifies different attentional architectures in machine learning. This formulation allows to to identify commonality across different attention ML architectures as well as suggest a bridge to those developed in neuroscience. We hope this work will guide more sophisticated intuitions into the key properties of attention architectures and suggest new ones.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Pretraining Language Models with Human Preferences
Authors:
Tomasz Korbak,
Kejian Shi,
Angelica Chen,
Rasika Bhalerao,
Christopher L. Buckley,
Jason Phang,
Samuel R. Bowman,
Ethan Perez
Abstract:
Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark…
▽ More
Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.
△ Less
Submitted 14 June, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Designing Ecosystems of Intelligence from First Principles
Authors:
Karl J Friston,
Maxwell J D Ramstead,
Alex B Kiefer,
Alexander Tschantz,
Christopher L Buckley,
Mahault Albarracin,
Riddhi J Pitliya,
Conor Heins,
Brennan Klein,
Beren Millidge,
Dalton A R Sakthivadivel,
Toby St Clere Smithe,
Magnus Koudahl,
Safae Essafi Tremblay,
Capm Petersen,
Kaiser Fung,
Jason G Fox,
Steven Swanson,
Dan Mapes,
Gabriel René
Abstract:
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read…
▽ More
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants -- what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world -- also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales: i.e., inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent's generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing -- leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first -- and key -- step towards such an ecology.
△ Less
Submitted 11 January, 2024; v1 submitted 2 December, 2022;
originally announced December 2022.
-
Capsule Networks as Generative Models
Authors:
Alex B. Kiefer,
Beren Millidge,
Alexander Tschantz,
Christopher L. Buckley
Abstract:
Capsule networks are a neural network architecture specialized for visual scene recognition. Features and pose information are extracted from a scene and then dynamically routed through a hierarchy of vector-valued nodes called 'capsules' to create an implicit scene graph, with the ultimate aim of learning vision directly as inverse graphics. Despite these intuitions, however, capsule networks are…
▽ More
Capsule networks are a neural network architecture specialized for visual scene recognition. Features and pose information are extracted from a scene and then dynamically routed through a hierarchy of vector-valued nodes called 'capsules' to create an implicit scene graph, with the ultimate aim of learning vision directly as inverse graphics. Despite these intuitions, however, capsule networks are not formulated as explicit probabilistic generative models; moreover, the routing algorithms typically used are ad-hoc and primarily motivated by algorithmic intuition. In this paper, we derive an alternative capsule routing algorithm utilizing iterative inference under sparsity constraints. We then introduce an explicit probabilistic generative model for capsule networks based on the self-attention operation in transformer networks and show how it is related to a variant of predictive coding networks using Von-Mises-Fisher (VMF) circular Gaussian distributions.
△ Less
Submitted 6 October, 2022; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Preventing Deterioration of Classification Accuracy in Predictive Coding Networks
Authors:
Paul F Kinghorn,
Beren Millidge,
Christopher L Buckley
Abstract:
Predictive Coding Networks (PCNs) aim to learn a generative model of the world. Given observations, this generative model can then be inverted to infer the causes of those observations. However, when training PCNs, a noticeable pathology is often observed where inference accuracy peaks and then declines with further training. This cannot be explained by overfitting since both training and test acc…
▽ More
Predictive Coding Networks (PCNs) aim to learn a generative model of the world. Given observations, this generative model can then be inverted to infer the causes of those observations. However, when training PCNs, a noticeable pathology is often observed where inference accuracy peaks and then declines with further training. This cannot be explained by overfitting since both training and test accuracy decrease simultaneously. Here we provide a thorough investigation of this phenomenon and show that it is caused by an imbalance between the speeds at which the various layers of the PCN converge. We demonstrate that this can be prevented by regularising the weight matrices at each layer: by restricting the relative size of matrix singular values, we allow the weight matrix to change but restrict the overall impact which a layer can have on its neighbours. We also demonstrate that a similar effect can be achieved through a more biologically plausible and simple scheme of just capping the weights.
△ Less
Submitted 1 September, 2022; v1 submitted 15 August, 2022;
originally announced August 2022.
-
Knitting a Markov blanket is hard when you are out-of-equilibrium: two examples in canonical nonequilibrium models
Authors:
Miguel Aguilera,
Ángel Poc-López,
Conor Heins,
Christopher L. Buckley
Abstract:
Bayesian theories of biological and brain function speculate that Markov blankets (a conditional independence separating a system from external states) play a key role for facilitating inference-like behaviour in living systems. Although it has been suggested that Markov blankets are commonplace in sparsely connected, nonequilibrium complex systems, this has not been studied in detail. Here, we sh…
▽ More
Bayesian theories of biological and brain function speculate that Markov blankets (a conditional independence separating a system from external states) play a key role for facilitating inference-like behaviour in living systems. Although it has been suggested that Markov blankets are commonplace in sparsely connected, nonequilibrium complex systems, this has not been studied in detail. Here, we show in two different examples (a pair of coupled Lorenz systems and a nonequilibrium Ising model) that sparse connectivity does not guarantee Markov blankets in the steady-state density of nonequilibrium systems. Conversely, in the nonequilibrium Ising model explored, the more distant from equilibrium the system appears to be correlated with the distance from displaying a Markov blanket. These result suggests that further assumptions might be needed in order to assume the presence of Markov blankets in the kind of nonequilibrium processes describing the activity of living systems.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Successor Representation Active Inference
Authors:
Beren Millidge,
Christopher L Buckley
Abstract:
Recent work has uncovered close links between between classical reinforcement learning algorithms, Bayesian filtering, and Active Inference which lets us understand value functions in terms of Bayesian posteriors. An alternative, but less explored, model-free RL algorithm is the successor representation, which expresses the value function in terms of a successor matrix of expected future state occ…
▽ More
Recent work has uncovered close links between between classical reinforcement learning algorithms, Bayesian filtering, and Active Inference which lets us understand value functions in terms of Bayesian posteriors. An alternative, but less explored, model-free RL algorithm is the successor representation, which expresses the value function in terms of a successor matrix of expected future state occupancies. In this paper, we derive the probabilistic interpretation of the successor representation in terms of Bayesian filtering and thus design a novel active inference agent architecture utilizing successor representations instead of model-based planning. We demonstrate that active inference successor representations have significant advantages over current active inference agents in terms of planning horizon and computational cost. Moreover, we demonstrate how the successor representation agent can generalize to changing reward functions such as variants of the expected free energy.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
RL with KL penalties is better viewed as Bayesian inference
Authors:
Tomasz Korbak,
Ethan Perez,
Christopher L Buckley
Abstract:
Reinforcement learning (RL) is frequently employed in fine-tuning large language models (LMs), such as GPT-3, to penalize them for undesirable features of generated sequences, such as offensiveness, social bias, harmfulness or falsehood. The RL formulation involves treating the LM as a policy and updating it to maximise the expected value of a reward function which captures human preferences, such…
▽ More
Reinforcement learning (RL) is frequently employed in fine-tuning large language models (LMs), such as GPT-3, to penalize them for undesirable features of generated sequences, such as offensiveness, social bias, harmfulness or falsehood. The RL formulation involves treating the LM as a policy and updating it to maximise the expected value of a reward function which captures human preferences, such as non-offensiveness. In this paper, we analyze challenges associated with treating a language model as an RL policy and show how avoiding those challenges requires moving beyond the RL paradigm. We start by observing that the standard RL approach is flawed as an objective for fine-tuning LMs because it leads to distribution collapse: turning the LM into a degenerate distribution. Then, we analyze KL-regularised RL, a widely used recipe for fine-tuning LMs, which additionally constrains the fine-tuned LM to stay close to its original distribution in terms of Kullback-Leibler (KL) divergence. We show that KL-regularised RL is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function. We argue that this Bayesian inference view of KL-regularised RL is more insightful than the typically employed RL perspective. The Bayesian inference view explains how KL-regularised RL avoids the distribution collapse problem and offers a first-principles derivation for its objective. While this objective happens to be equivalent to RL (with a particular choice of parametric reward), there exist other objectives for fine-tuning LMs which are no longer equivalent to RL. That observation leads to a more general point: RL is not an adequate formal framework for problems such as fine-tuning language models. These problems are best viewed as Bayesian inference: approximating a pre-defined target distribution.
△ Less
Submitted 21 October, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Hybrid Predictive Coding: Inferring, Fast and Slow
Authors:
Alexander Tschantz,
Beren Millidge,
Anil K Seth,
Christopher L Buckley
Abstract:
Predictive coding is an influential model of cortical neural activity. It proposes that perceptual beliefs are furnished by sequentially minimising "prediction errors" - the differences between predicted and observed data. Implicit in this proposal is the idea that perception requires multiple cycles of neural activity. This is at odds with evidence that several aspects of visual perception - incl…
▽ More
Predictive coding is an influential model of cortical neural activity. It proposes that perceptual beliefs are furnished by sequentially minimising "prediction errors" - the differences between predicted and observed data. Implicit in this proposal is the idea that perception requires multiple cycles of neural activity. This is at odds with evidence that several aspects of visual perception - including complex forms of object recognition - arise from an initial "feedforward sweep" that occurs on fast timescales which preclude substantial recurrent activity. Here, we propose that the feedforward sweep can be understood as performing amortized inference and recurrent processing can be understood as performing iterative inference. We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner by describing both in terms of a dual optimization of a single objective function. We show that the resulting scheme can be implemented in a biologically plausible neural architecture that approximates Bayesian inference utilising local Hebbian update rules. We demonstrate that our hybrid predictive coding model combines the benefits of both amortized and iterative inference -- obtaining rapid and computationally cheap perceptual inference for familiar data while maintaining the context-sensitivity, precision, and sample efficiency of iterative inference schemes. Moreover, we show how our model is inherently sensitive to its uncertainty and adaptively balances iterative and amortized inference to obtain accurate beliefs using minimum computational expense. Hybrid predictive coding offers a new perspective on the functional relevance of the feedforward and recurrent activity observed during visual perception and offers novel insights into distinct aspects of visual phenomenology.
△ Less
Submitted 6 April, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Active Inference in Robotics and Artificial Agents: Survey and Challenges
Authors:
Pablo Lanillos,
Cristian Meo,
Corrado Pezzato,
Ajith Anil Meera,
Mohamed Baioumy,
Wataru Ohata,
Alexander Tschantz,
Beren Millidge,
Martijn Wisse,
Christopher L. Buckley,
Jun Tani
Abstract:
Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning. Recently, it has been shown to be a promising approach to the problems of state-estimation and control under uncertainty, as well as a foundation for the construction of goal-driven behaviours in robotics and artificial agents in gener…
▽ More
Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning. Recently, it has been shown to be a promising approach to the problems of state-estimation and control under uncertainty, as well as a foundation for the construction of goal-driven behaviours in robotics and artificial agents in general. Here, we review the state-of-the-art theory and implementations of active inference for state-estimation, control, planning and learning; describing current achievements with a particular focus on robotics. We showcase relevant experiments that illustrate its potential in terms of adaptation, generalization and robustness. Furthermore, we connect this approach with other frameworks and discuss its expected benefits and challenges: a unified framework with functional biological plausibility using variational Bayesian inference.
△ Less
Submitted 3 December, 2021;
originally announced December 2021.
-
Habitual and Reflective Control in Hierarchical Predictive Coding
Authors:
Paul F. Kinghorn,
Beren Millidge,
Christopher L. Buckley
Abstract:
In cognitive science, behaviour is often separated into two types. Reflexive control is habitual and immediate, whereas reflective is deliberative and time consuming. We examine the argument that Hierarchical Predictive Coding (HPC) can explain both types of behaviour as a continuum operating across a multi-layered network, removing the need for separate circuits in the brain. On this view, "fast"…
▽ More
In cognitive science, behaviour is often separated into two types. Reflexive control is habitual and immediate, whereas reflective is deliberative and time consuming. We examine the argument that Hierarchical Predictive Coding (HPC) can explain both types of behaviour as a continuum operating across a multi-layered network, removing the need for separate circuits in the brain. On this view, "fast" actions may be triggered using only the lower layers of the HPC schema, whereas more deliberative actions need higher layers. We demonstrate that HPC can distribute learning throughout its hierarchy, with higher layers called into use only as required.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
A Mathematical Walkthrough and Discussion of the Free Energy Principle
Authors:
Beren Millidge,
Anil Seth,
Christopher L Buckley
Abstract:
The Free-Energy-Principle (FEP) is an influential and controversial theory which postulates a deep and powerful connection between the stochastic thermodynamics of self-organization and learning through variational inference. Specifically, it claims that any self-organizing system which can be statistically separated from its environment, and which maintains itself at a non-equilibrium steady stat…
▽ More
The Free-Energy-Principle (FEP) is an influential and controversial theory which postulates a deep and powerful connection between the stochastic thermodynamics of self-organization and learning through variational inference. Specifically, it claims that any self-organizing system which can be statistically separated from its environment, and which maintains itself at a non-equilibrium steady state, can be construed as minimizing an information-theoretic functional -- the variational free energy -- and thus performing variational Bayesian inference to infer the hidden state of its environment. This principle has also been applied extensively in neuroscience, and is beginning to make inroads in machine learning by spurring the construction of novel and powerful algorithms by which action, perception, and learning can all be unified under a single objective. While its expansive and often grandiose claims have spurred significant debates in both philosophy and theoretical neuroscience, the mathematical depth and lack of accessible introductions and tutorials for the core claims of the theory have often precluded a deep understanding within the literature. Here, we aim to provide a mathematically detailed, yet intuitive walk-through of the formulation and central claims of the FEP while also providing a discussion of the assumptions necessary and potential limitations of the theory. Additionally, since the FEP is a still a living theory, subject to internal controversy, change, and revision, we also present a detailed appendix highlighting and condensing current perspectives as well as controversies about the nature, applicability, and the mathematical assumptions and formalisms underlying the FEP.
△ Less
Submitted 1 October, 2021; v1 submitted 30 August, 2021;
originally announced August 2021.
-
Predictive Coding: a Theoretical and Experimental Review
Authors:
Beren Millidge,
Anil Seth,
Christopher L Buckley
Abstract:
Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A larg…
▽ More
Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world. The theory is closely related to the Bayesian brain framework and, over the last two decades, has gained substantial influence in the fields of theoretical and cognitive neuroscience. A large body of research has arisen based on both empirically testing improved and extended theoretical and mathematical models of predictive coding, as well as in evaluating their potential biological plausibility for implementation in the brain and the concrete neurophysiological and psychological predictions made by the theory. Despite this enduring popularity, however, no comprehensive review of predictive coding theory, and especially of recent developments in this field, exists. Here, we provide a comprehensive review both of the core mathematical structure and logic of predictive coding, thus complementing recent tutorials in the literature. We also review a wide range of classic and recent work within the framework, ranging from the neurobiologically realistic microcircuits that could implement predictive coding, to the close relationship between predictive coding and the widely-used backpropagation of error algorithm, as well as surveying the close relationships between predictive coding and modern machine learning techniques.
△ Less
Submitted 12 July, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
How particular is the physics of the free energy principle?
Authors:
Miguel Aguilera,
Beren Millidge,
Alexander Tschantz,
Christopher L. Buckley
Abstract:
The free energy principle (FEP) states that any dynamical system can be interpreted as performing Bayesian inference upon its surrounding environment. In this work, we examine in depth the assumptions required to derive the FEP in the simplest possible set of systems -- weakly-coupled non-equilibrium linear stochastic systems. Specifically, we explore (i) how general the requirements imposed on th…
▽ More
The free energy principle (FEP) states that any dynamical system can be interpreted as performing Bayesian inference upon its surrounding environment. In this work, we examine in depth the assumptions required to derive the FEP in the simplest possible set of systems -- weakly-coupled non-equilibrium linear stochastic systems. Specifically, we explore (i) how general the requirements imposed on the statistical structure of a system are and (ii) how informative the FEP is about the behaviour of such systems. We discover that two requirements of the FEP -- the Markov blanket condition (i.e. a statistical boundary precluding direct coupling between internal and external states) and stringent restrictions on its solenoidal flows (i.e. tendencies driving a system out of equilibrium) -- are only valid for a very narrow space of parameters. Suitable systems require an absence of perception-action asymmetries that is highly unusual for living systems interacting with an environment. More importantly, we observe that a mathematically central step in the argument, connecting the behaviour of a system to variational inference, relies on an implicit equivalence between the dynamics of the average states of a system with the average of the dynamics of those states. This equivalence does not hold in general even for linear systems, since it requires an effective decoupling from the system's history of interactions. These observations are critical for evaluating the generality and applicability of the FEP and indicate the existence of significant problems of the theory in its current form. These issues make the FEP, as it stands, not straightforwardly applicable to the simple linear systems studied here and suggest that more development is needed before the theory could be applied to the kind of complex systems that describe living and cognitive processes.
△ Less
Submitted 19 May, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Investigating the Scalability and Biological Plausibility of the Activation Relaxation Algorithm
Authors:
Beren Millidge,
Alexander Tschantz,
Anil Seth,
Christopher L Buckley
Abstract:
The recently proposed Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm using only local learning rules. Unlike competing schemes, it converges to the exact backpropagation gradients, and utilises only a single type of computational unit and a single backwards relaxation phase. We have previously shown that the algor…
▽ More
The recently proposed Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm using only local learning rules. Unlike competing schemes, it converges to the exact backpropagation gradients, and utilises only a single type of computational unit and a single backwards relaxation phase. We have previously shown that the algorithm can be further simplified and made more biologically plausible by (i) introducing a learnable set of backwards weights, which overcomes the weight-transport problem, and (ii) avoiding the computation of nonlinear derivatives at each neuron. However, tthe efficacy of these simplifications has, so far, only been tested on simple multi-layer-perceptron (MLP) networks. Here, we show that these simplifications still maintain performance using more complex CNN architectures and challenging datasets, which have proven difficult for other biologically-plausible schemes to scale to. We also investigate whether another biologically implausible assumption of the original AR algorithm -- the frozen feedforward pass -- can be relaxed without damaging performance.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Relaxing the Constraints on Predictive Coding Models
Authors:
Beren Millidge,
Alexander Tschantz,
Anil Seth,
Christopher L Buckley
Abstract:
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs, which underlies both perception and learning, is the minimization of prediction errors. While motivated by high-level notions of variational inference, detailed neurophysiological models of cortical microcircuits which can implements its computations have been developed.…
▽ More
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs, which underlies both perception and learning, is the minimization of prediction errors. While motivated by high-level notions of variational inference, detailed neurophysiological models of cortical microcircuits which can implements its computations have been developed. Moreover, under certain conditions, predictive coding has been shown to approximate the backpropagation of error algorithm, and thus provides a relatively biologically plausible credit-assignment mechanism for training deep networks. However, standard implementations of the algorithm still involve potentially neurally implausible features such as identical forward and backward weights, backward nonlinear derivatives, and 1-1 error unit connectivity. In this paper, we show that these features are not integral to the algorithm and can be removed either directly or through learning additional sets of parameters with Hebbian update rules without noticeable harm to learning performance. Our work thus relaxes current constraints on potential microcircuit designs and hopefully opens up new regions of the design-space for neuromorphic implementations of predictive coding.
△ Less
Submitted 10 October, 2020; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain
Authors:
Beren Millidge,
Alexander Tschantz,
Anil K Seth,
Christopher L Buckley
Abstract:
The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backpr…
▽ More
The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backprop. Recently several algorithms for approximating backprop using only local signals have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes, or multiple sequential backwards phases with information being stored across phases. Here, we propose a novel algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, utilises only a single parallel backwards relaxation phase, and can operate on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.
△ Less
Submitted 10 October, 2020; v1 submitted 11 September, 2020;
originally announced September 2020.
-
Control as Hybrid Inference
Authors:
Alexander Tschantz,
Beren Millidge,
Anil K. Seth,
Christopher L. Buckley
Abstract:
The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference' (CHI) framework. We present an implementation of CHI which naturally mediates the balance between it…
▽ More
The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference' (CHI) framework. We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. Using a didactic experiment, we demonstrate that the proposed algorithm operates in a model-based manner at the onset of learning, before converging to a model-free algorithm once sufficient data have been collected. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines. CHI thus provides a principled framework for harnessing the sample efficiency of model-based planning while retaining the asymptotic performance of model-free policy optimisation.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
On the Relationship Between Active Inference and Control as Inference
Authors:
Beren Millidge,
Alexander Tschantz,
Anil K Seth,
Christopher L Buckley
Abstract:
Active Inference (AIF) is an emerging framework in the brain sciences which suggests that biological agents act to minimise a variational bound on model evidence. Control-as-Inference (CAI) is a framework within reinforcement learning which casts decision making as a variational inference problem. While these frameworks both consider action selection through the lens of variational inference, thei…
▽ More
Active Inference (AIF) is an emerging framework in the brain sciences which suggests that biological agents act to minimise a variational bound on model evidence. Control-as-Inference (CAI) is a framework within reinforcement learning which casts decision making as a variational inference problem. While these frameworks both consider action selection through the lens of variational inference, their relationship remains unclear. Here, we provide a formal comparison between them and demonstrate that the primary difference arises from how value is incorporated into their respective generative models. In the context of this comparison, we highlight several ways in which these frameworks can inform one another.
△ Less
Submitted 29 June, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Predictions in the eye of the beholder: an active inference account of Watt governors
Authors:
Manuel Baltieri,
Christopher L. Buckley,
Jelle Bruineberg
Abstract:
Active inference introduces a theory describing action-perception loops via the minimisation of variational (and expected) free energy or, under simplifying assumptions, (weighted) prediction error. Recently, active inference has been proposed as part of a new and unifying framework in the cognitive sciences: predictive processing. Predictive processing is often associated with traditional computa…
▽ More
Active inference introduces a theory describing action-perception loops via the minimisation of variational (and expected) free energy or, under simplifying assumptions, (weighted) prediction error. Recently, active inference has been proposed as part of a new and unifying framework in the cognitive sciences: predictive processing. Predictive processing is often associated with traditional computational theories of the mind, strongly relying on internal representations presented in the form of generative models thought to explain different functions of living and cognitive systems. In this work, we introduce an active inference formulation of the Watt centrifugal governor, a system often portrayed as the canonical "anti-representational" metaphor for cognition. We identify a generative model of a steam engine for the governor, and derive a set of equations describing "perception" and "action" processes as a form of prediction error minimisation. In doing so, we firstly challenge the idea of generative models as explicit internal representations for cognitive systems, suggesting that such models serve only as implicit descriptions for an observer. Secondly, we consider current proposals of predictive processing as a theory of cognition, focusing on some of its potential shortcomings and in particular on the idea that virtually any system admits a description in terms of prediction error minimisation, suggesting that this theory may offer limited explanatory power for cognitive systems. Finally, as a silver lining we emphasise the instrumental role this framework can nonetheless play as a mathematical tool for modelling cognitive architectures interpreted in terms of Bayesian (active) inference.
△ Less
Submitted 25 June, 2020; v1 submitted 20 June, 2020;
originally announced June 2020.
-
Reinforcement Learning as Iterative and Amortised Inference
Authors:
Beren Millidge,
Alexander Tschantz,
Anil K Seth,
Christopher L Buckley
Abstract:
There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline. Broad classification schemes such as these help provide a unified perspective on disparate techniques and can contextualise and guide the development of new algorithms. In this paper, we utilise the co…
▽ More
There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline. Broad classification schemes such as these help provide a unified perspective on disparate techniques and can contextualise and guide the development of new algorithms. In this paper, we utilise the control as inference framework to outline a novel classification scheme based on amortised and iterative inference. We demonstrate that a wide range of algorithms can be classified in this manner providing a fresh perspective and highlighting a range of existing similarities. Moreover, we show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored, suggesting new routes to innovative RL algorithms.
△ Less
Submitted 5 July, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs
Authors:
Beren Millidge,
Alexander Tschantz,
Christopher L. Buckley
Abstract:
Backpropagation of error (backprop) is a powerful algorithm for training machine learning architectures through end-to-end differentiation. However, backprop is often criticised for lacking biological plausibility. Recently, it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation w…
▽ More
Backpropagation of error (backprop) is a powerful algorithm for training machine learning architectures through end-to-end differentiation. However, backprop is often criticised for lacking biological plausibility. Recently, it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation which relies only on local and Hebbian updates. The power of backprop, however, lies not in its instantiation in MLPs, but rather in the concept of automatic differentiation which allows for the optimisation of any differentiable program expressed as a computation graph. Here, we demonstrate that predictive coding converges asymptotically (and in practice rapidly) to exact backprop gradients on arbitrary computation graphs using only local learning rules. We apply this result to develop a straightforward strategy to translate core machine learning architectures into their predictive coding equivalents. We construct predictive coding CNNs, RNNs, and the more complex LSTMs, which include a non-layer-like branching internal graph structure and multiplicative interactions. Our models perform equivalently to backprop on challenging machine learning benchmarks, while utilising only local and (mostly) Hebbian plasticity. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry, and may also contribute to the development of completely distributed neuromorphic architectures.
△ Less
Submitted 5 October, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
On Kalman-Bucy filters, linear quadratic control and active inference
Authors:
Manuel Baltieri,
Christopher L. Buckley
Abstract:
Linear Quadratic Gaussian (LQG) control is a framework first introduced in control theory that provides an optimal solution to linear problems of regulation in the presence of uncertainty. This framework combines Kalman-Bucy filters for the estimation of hidden states with Linear Quadratic Regulators for the control of their dynamics. Nowadays, LQG is also a common paradigm in neuroscience, where…
▽ More
Linear Quadratic Gaussian (LQG) control is a framework first introduced in control theory that provides an optimal solution to linear problems of regulation in the presence of uncertainty. This framework combines Kalman-Bucy filters for the estimation of hidden states with Linear Quadratic Regulators for the control of their dynamics. Nowadays, LQG is also a common paradigm in neuroscience, where it is used to characterise different approaches to sensorimotor control based on state estimators, forward and inverse models. According to this paradigm, perception can be seen as a process of Bayesian inference and action as a process of optimal control. Recently, active inference has been introduced as a process theory derived from a variational approximation of Bayesian inference problems that describes, among others, perception and action in terms of (variational and expected) free energy minimisation. Active inference relies on a mathematical formalism similar to LQG, but offers a rather different perspective on problems of sensorimotor control in biological systems based on a process of biased perception. In this note we compare the mathematical treatments of these two frameworks for linear systems, focusing on their respective assumptions and highlighting their commonalities and technical differences.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Whence the Expected Free Energy?
Authors:
Beren Millidge,
Alexander Tschantz,
Christopher L Buckley
Abstract:
The Expected Free Energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference agents evince. Despite its importance, the mathematical origins of this quantity…
▽ More
The Expected Free Energy (EFE) is a central quantity in the theory of active inference. It is the quantity that all active inference agents are mandated to minimize through action, and its decomposition into extrinsic and intrinsic value terms is key to the balance of exploration and exploitation that active inference agents evince. Despite its importance, the mathematical origins of this quantity and its relation to the Variational Free Energy (VFE) remain unclear. In this paper, we investigate the origins of the EFE in detail and show that it is not simply "the free energy in the future". We present a functional that we argue is the natural extension of the VFE, but which actively discourages exploratory behaviour, thus demonstrating that exploration does not directly follow from free energy minimization into the future. We then develop a novel objective, the Free-Energy of the Expected Future (FEEF), which possesses both the epistemic component of the EFE as well as an intuitive mathematical grounding as the divergence between predicted and desired futures.
△ Less
Submitted 28 September, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Reinforcement Learning through Active Inference
Authors:
Alexander Tschantz,
Beren Millidge,
Anil K. Seth,
Christopher L. Buckley
Abstract:
The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computational neuroscience, proposes that agents act to maximize the evidence for a biased generative model. Here, we illustrate how ideas from active inference can augment traditional RL approaches by (i) furnishi…
▽ More
The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards. In contrast, active inference, an emerging framework within cognitive and computational neuroscience, proposes that agents act to maximize the evidence for a biased generative model. Here, we illustrate how ideas from active inference can augment traditional RL approaches by (i) furnishing an inherent balance of exploration and exploitation, and (ii) providing a more flexible conceptualization of reward. Inspired by active inference, we develop and implement a novel objective for decision making, which we term the free energy of the expected future. We demonstrate that the resulting algorithm successfully balances exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
Scaling active inference
Authors:
Alexander Tschantz,
Manuel Baltieri,
Anil. K. Seth,
Christopher L. Buckley
Abstract:
In reinforcement learning (RL), agents often operate in partially observed and uncertain environments. Model-based RL suggests that this is best achieved by learning and exploiting a probabilistic model of the world. 'Active inference' is an emerging normative framework in cognitive and computational neuroscience that offers a unifying account of how biological agents achieve this. On this framewo…
▽ More
In reinforcement learning (RL), agents often operate in partially observed and uncertain environments. Model-based RL suggests that this is best achieved by learning and exploiting a probabilistic model of the world. 'Active inference' is an emerging normative framework in cognitive and computational neuroscience that offers a unifying account of how biological agents achieve this. On this framework, inference, learning and action emerge from a single imperative to maximize the Bayesian evidence for a niched model of the world. However, implementations of this process have thus far been restricted to low-dimensional and idealized situations. Here, we present a working implementation of active inference that applies to high-dimensional tasks, with proof-of-principle results demonstrating efficient exploration and an order of magnitude increase in sample efficiency over strong model-free baselines. Our results demonstrate the feasibility of applying active inference at scale and highlight the operational homologies between active inference and current model-based approaches to RL.
△ Less
Submitted 24 November, 2019;
originally announced November 2019.
-
Generative models as parsimonious descriptions of sensorimotor loops
Authors:
Manuel Baltieri,
Christopher L. Buckley
Abstract:
The Bayesian brain hypothesis, predictive processing and variational free energy minimisation are typically used to describe perceptual processes based on accurate generative models of the world. However, generative models need not be veridical representations of the environment. We suggest that they can (and should) be used to describe sensorimotor relationships relevant for behaviour rather than…
▽ More
The Bayesian brain hypothesis, predictive processing and variational free energy minimisation are typically used to describe perceptual processes based on accurate generative models of the world. However, generative models need not be veridical representations of the environment. We suggest that they can (and should) be used to describe sensorimotor relationships relevant for behaviour rather than precise accounts of the world.
△ Less
Submitted 29 April, 2019;
originally announced April 2019.
-
Nonmodular architectures of cognitive systems based on active inference
Authors:
Manuel Baltieri,
Christopher L. Buckley
Abstract:
In psychology and neuroscience it is common to describe cognitive systems as input/output devices where perceptual and motor functions are implemented in a purely feedforward, open-loop fashion. On this view, perception and action are often seen as encapsulated modules with limited interaction between them. While embodied and enactive approaches to cognitive science have challenged the idealisatio…
▽ More
In psychology and neuroscience it is common to describe cognitive systems as input/output devices where perceptual and motor functions are implemented in a purely feedforward, open-loop fashion. On this view, perception and action are often seen as encapsulated modules with limited interaction between them. While embodied and enactive approaches to cognitive science have challenged the idealisation of the brain as an input/output device, we argue that even the more recent attempts to model systems using closed-loop architectures still heavily rely on a strong separation between motor and perceptual functions. Previously, we have suggested that the mainstream notion of modularity strongly resonates with the separation principle of control theory. In this work we present a minimal model of a sensorimotor loop implementing an architecture based on the separation principle. We link this to popular formulations of perception and action in the cognitive sciences, and show its limitations when, for instance, external forces are not modelled by an agent. These forces can be seen as variables that an agent cannot directly control, i.e., a perturbation from the environment or an interference caused by other agents. As an alternative approach inspired by embodied cognitive science, we then propose a nonmodular architecture based on the active inference framework. We demonstrate the robustness of this architecture to unknown external inputs and show that the mechanism with which this is achieved in linear models is equivalent to integral control.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
The modularity of action and perception revisited using control theory and active inference
Authors:
Manuel Baltieri,
Christopher L. Buckley
Abstract:
The assumption that action and perception can be investigated independently is entrenched in theories, models and experimental approaches across the brain and mind sciences. In cognitive science, this has been a central point of contention between computationalist and 4Es (enactive, embodied, extended and embedded) theories of cognition, with the former embracing the "classical sandwich", modular,…
▽ More
The assumption that action and perception can be investigated independently is entrenched in theories, models and experimental approaches across the brain and mind sciences. In cognitive science, this has been a central point of contention between computationalist and 4Es (enactive, embodied, extended and embedded) theories of cognition, with the former embracing the "classical sandwich", modular, architecture of the mind and the latter actively denying this separation can be made. In this work we suggest that the modular independence of action and perception strongly resonates with the separation principle of control theory and furthermore that this principle provides formal criteria within which to evaluate the implications of the modularity of action and perception. We will also see that real-time feedback with the environment, often considered necessary for the definition of 4Es ideas, is not however a sufficient condition to avoid the "classical sandwich". Finally, we argue that an emerging framework in the cognitive and brain sciences, active inference, extends ideas derived from control theory to the study of biological systems while disposing of the separation principle, describing non-modular models of behaviour strongly aligned with 4Es theories of cognition.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
An active inference implementation of phototaxis
Authors:
Manuel Baltieri,
Christopher L. Buckley
Abstract:
Active inference is emerging as a possible unifying theory of perception and action in cognitive and computational neuroscience. On this theory, perception is a process of inferring the causes of sensory data by minimising the error between actual sensations and those predicted by an inner \emph{generative} (probabilistic) model. Action on the other hand is drawn as a process that modifies the wor…
▽ More
Active inference is emerging as a possible unifying theory of perception and action in cognitive and computational neuroscience. On this theory, perception is a process of inferring the causes of sensory data by minimising the error between actual sensations and those predicted by an inner \emph{generative} (probabilistic) model. Action on the other hand is drawn as a process that modifies the world such that the consequent sensory input meets expectations encoded in the same internal model. These two processes, inferring properties of the world and inferring actions needed to meet expectations, close the sensory/motor loop and suggest a deep symmetry between action and perception. In this work we present a simple agent-based model inspired by this new theory that offers insights on some of its central ideas. Previous implementations of active inference have typically examined a "perception-oriented" view of this theory, assuming that agents are endowed with a detailed generative model of their surrounding environment. In contrast, we present an "action-oriented" solution showing how adaptive behaviour can emerge even when agents operate with a simple model which bears little resemblance to their environment. We examine how various parameters of this formulation allow phototaxis and present an example of a different, "pathological" behaviour.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
The free energy principle for action and perception: A mathematical review
Authors:
Christopher L. Buckley,
Chang Sub Kim,
Simon McGregor,
Anil K. Seth
Abstract:
The 'free energy principle' (FEP) has been suggested to provide a unified theory of the brain, integrating data and theory relating to action, perception, and learning. The theory and implementation of the FEP combines insights from Helmholtzian 'perception as inference', machine learning theory, and statistical thermodynamics. Here, we provide a detailed mathematical evaluation of a suggested bio…
▽ More
The 'free energy principle' (FEP) has been suggested to provide a unified theory of the brain, integrating data and theory relating to action, perception, and learning. The theory and implementation of the FEP combines insights from Helmholtzian 'perception as inference', machine learning theory, and statistical thermodynamics. Here, we provide a detailed mathematical evaluation of a suggested biologically plausible implementation of the FEP that has been widely used to develop the theory. Our objectives are (i) to describe within a single article the mathematical structure of this implementation of the FEP; (ii) provide a simple but complete agent-based model utilising the FEP; (iii) disclose the assumption structure of this implementation of the FEP to help elucidate its significance for the brain sciences.
△ Less
Submitted 24 May, 2017;
originally announced May 2017.
-
Brain State Control by Closed-Loop Environmental Feedback
Authors:
Christopher L. Buckley,
Satohiro Tajima,
Toru Yanagawa,
Kana Takakura,
Yasuo Nagasaka,
Naotaka Fujii,
Taro Toyoizumi
Abstract:
Brain state regulates sensory processing and motor control for adaptive behavior. Internal mechanisms of brain state control are well studied, but the role of external modulation from the environment is not well understood. Here, we examined the role of closed-loop environmental (CLE) feedback, in comparison to open-loop sensory input, on brain state and behavior in diverse vertebrate systems. In…
▽ More
Brain state regulates sensory processing and motor control for adaptive behavior. Internal mechanisms of brain state control are well studied, but the role of external modulation from the environment is not well understood. Here, we examined the role of closed-loop environmental (CLE) feedback, in comparison to open-loop sensory input, on brain state and behavior in diverse vertebrate systems. In fictively swimming zebrafish, CLE feedback for optomotor stability controlled brain state by reducing coherent neuronal activity. The role of CLE feedback in brain state was also shown in a model of rodent active whisking, where brief interruptions in this feedback enhanced signal-to-noise ratio for detecting touch. Finally, in monkey visual fixation, artificial CLE feedback suppressed stimulus-specific neuronal activity and improved behavioral performance. Our findings show that the environment mediates continuous closed-loop feedback that controls neuronal gain, regulating brain state, and that brain function is an emergent property of brain-environment interactions.
△ Less
Submitted 29 February, 2016;
originally announced February 2016.
-
A Minimal Active Inference Agent
Authors:
Simon McGregor,
Manuel Baltieri,
Christopher L. Buckley
Abstract:
Research on the so-called "free-energy principle'' (FEP) in cognitive neuroscience is becoming increasingly high-profile. To date, introductions to this theory have proved difficult for many readers to follow, but it depends mainly upon two relatively simple ideas: firstly that normative or teleological values can be expressed as probability distributions (active inference), and secondly that appr…
▽ More
Research on the so-called "free-energy principle'' (FEP) in cognitive neuroscience is becoming increasingly high-profile. To date, introductions to this theory have proved difficult for many readers to follow, but it depends mainly upon two relatively simple ideas: firstly that normative or teleological values can be expressed as probability distributions (active inference), and secondly that approximate Bayesian reasoning can be effectively performed by gradient descent on model parameters (the free-energy principle). The notion of active inference is of great interest for a number of disciplines including cognitive science and artificial intelligence, as well as cognitive neuroscience, and deserves to be more widely known.
This paper attempts to provide an accessible introduction to active inference and informational free-energy, for readers from a range of scientific backgrounds. In this work introduce an agent-based model with an agent trying to make predictions about its position in a one-dimensional discretized world using methods from the FEP.
△ Less
Submitted 13 March, 2015;
originally announced March 2015.
-
A Graph Theoretic Interpretation of Neural Complexity
Authors:
L. Barnett,
C. L. Buckley,
S. Bullock
Abstract:
One of the central challenges facing modern neuroscience is to explain the ability of the nervous system to coherently integrate information across distinct functional modules in the absence of a central executive. To this end Tononi et al. [Proc. Nat. Acad. Sci. USA 91, 5033 (1994)] proposed a measure of neural complexity that purports to capture this property based on mutual information between…
▽ More
One of the central challenges facing modern neuroscience is to explain the ability of the nervous system to coherently integrate information across distinct functional modules in the absence of a central executive. To this end Tononi et al. [Proc. Nat. Acad. Sci. USA 91, 5033 (1994)] proposed a measure of neural complexity that purports to capture this property based on mutual information between complementary subsets of a system. Neural complexity, so defined, is one of a family of information theoretic metrics developed to measure the balance between the segregation and integration of a system's dynamics. One key question arising for such measures involves understanding how they are influenced by network topology. Sporns et al. [Cereb. Cortex 10, 127 (2000)] employed numerical models in order to determine the dependence of neural complexity on the topological features of a network. However, a complete picture has yet to be established. While De Lucia et al. [Phys. Rev. E 71, 016114 (2005)] made the first attempts at an analytical account of this relationship, their work utilized a formulation of neural complexity that, we argue, did not reflect the intuitions of the original work. In this paper we start by describing weighted connection matrices formed by applying a random continuous weight distribution to binary adjacency matrices. This allows us to derive an approximation for neural complexity in terms of the moments of the weight distribution and elementary graph motifs. In particular we explicitly establish a dependency of neural complexity on cyclic graph motifs.
△ Less
Submitted 29 November, 2010; v1 submitted 24 November, 2010;
originally announced November 2010.