-
Opportunities of Reinforcement Learning in South Africa's Just Transition
Authors:
Claude Formanek,
Callum Rhys Tilbury,
Jonathan P. Shock
Abstract:
South Africa stands at a crucial juncture, grappling with interwoven socio-economic challenges such as poverty, inequality, unemployment, and the looming climate crisis. The government's Just Transition framework aims to enhance climate resilience, achieve net-zero greenhouse gas emissions by 2050, and promote social inclusion and poverty eradication. According to the Presidential Commission on th…
▽ More
South Africa stands at a crucial juncture, grappling with interwoven socio-economic challenges such as poverty, inequality, unemployment, and the looming climate crisis. The government's Just Transition framework aims to enhance climate resilience, achieve net-zero greenhouse gas emissions by 2050, and promote social inclusion and poverty eradication. According to the Presidential Commission on the Fourth Industrial Revolution, artificial intelligence technologies offer significant promise in addressing these challenges. This paper explores the overlooked potential of Reinforcement Learning (RL) in supporting South Africa's Just Transition. It examines how RL can enhance agriculture and land-use practices, manage complex, decentralised energy networks, and optimise transportation and logistics, thereby playing a critical role in achieving a just and equitable transition to a low-carbon future for all South Africans. We provide a roadmap as to how other researchers in the field may be able to contribute to these pressing problems.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning
Authors:
Claude Formanek,
Louise Beyers,
Callum Rhys Tilbury,
Jonathan P. Shock,
Arnu Pretorius
Abstract:
Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of wor…
▽ More
Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Coordination Failure in Cooperative Offline MARL
Authors:
Callum Rhys Tilbury,
Claude Formanek,
Louise Beyers,
Jonathan P. Shock,
Arnu Pretorius
Abstract:
Offline multi-agent reinforcement learning (MARL) leverages static datasets of experience to learn optimal multi-agent control. However, learning from static data presents several unique challenges to overcome. In this paper, we focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data, focusing on a common setting we refer to as the…
▽ More
Offline multi-agent reinforcement learning (MARL) leverages static datasets of experience to learn optimal multi-agent control. However, learning from static data presents several unique challenges to overcome. In this paper, we focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data, focusing on a common setting we refer to as the 'Best Response Under Data' (BRUD) approach. By using two-player polynomial games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms, which can lead to catastrophic coordination failure in the offline setting. Building on these insights, we propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity during policy learning and demonstrate its effectiveness in detailed experiments. More generally, however, we argue that prioritised dataset sampling is a promising area for innovation in offline MARL that can be combined with other effective approaches such as critic and policy regularisation. Importantly, our work shows how insights drawn from simplified, tractable games can lead to useful, theoretically grounded insights that transfer to more complex contexts. A core dimension of offering is an interactive notebook, from which almost all of our results can be reproduced, in a browser.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Sophisticated Learning: A novel algorithm for active learning during model-based planning
Authors:
Rowan Hodson,
Bruce Bassett,
Charel van Hoof,
Benjamin Rosman,
Mark Solms,
Jonathan P. Shock,
Ryan Smith
Abstract:
We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would…
▽ More
We introduce Sophisticated Learning (SL), a planning-to-learn algorithm that embeds active parameter learning inside the Sophisticated Inference (SI) tree-search framework of Active Inference. Unlike SI -- which optimizes beliefs about hidden states -- SL also updates beliefs about model parameters within each simulated branch, enabling counterfactual reasoning about how future observations would improve subsequent planning.
We compared SL with Bayes-adaptive Reinforcement Learning (BARL) agents as well as with its parent algorithm, SI. Using a biologically inspired seasonal foraging task in which resources shift probabilistically over a 10x10 grid, we designed experiments that forced agents to balance probabilistic reward harvesting against information gathering.
In early trials, where rapid learning is vital, SL agents survive, on average, 8.2% longer than SI and 35% longer than Bayes-adaptive Reinforcement Learning. While both SL and SI showed equal convergence performance, SL reached this convergence 40% faster than SI. Additionally, SL showed robust out-performance of other algorithms in altered environment configurations.
Our results show that incorporating active learning into multi-step planning materially improves decision making under radical uncertainty, and reinforces the broader utility of Active Inference for modeling biologically relevant behavior.
△ Less
Submitted 14 August, 2025; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Probability Density Functions from the Fisher Information Metric
Authors:
T. Clingman,
Jeff Murugan,
Jonathan P. Shock
Abstract:
We show a general relation between the spatially disjoint product of probability density functions and the sum of their Fisher information metric tensors. We then utilise this result to give a method for constructing the probability density functions for an arbitrary Riemannian Fisher information metric tensor. We note further that this construction is extremely unconstrained, depending only on ce…
▽ More
We show a general relation between the spatially disjoint product of probability density functions and the sum of their Fisher information metric tensors. We then utilise this result to give a method for constructing the probability density functions for an arbitrary Riemannian Fisher information metric tensor. We note further that this construction is extremely unconstrained, depending only on certain continuity properties of the probability density functions and a select symmetry of their domains.
△ Less
Submitted 13 April, 2015;
originally announced April 2015.