-
Oryx: a Performant and Scalable Algorithm for Many-Agent Coordination in Offline MARL
Authors:
Claude Formanek,
Omayma Mahjoub,
Louay Ben Nessir,
Sasha Abramowitz,
Ruan de Kock,
Wiem Khlifi,
Simon Du Toit,
Felix Chalumeau,
Daniel Rajaonarivonivelomanantsoa,
Arnol Fokam,
Siddarth Singh,
Ulrich Mbou Sob,
Arnu Pretorius
Abstract:
A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-l…
▽ More
A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline auto-regressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over lengthy trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works (SMAC, RWARE, and Multi-Agent MuJoCo) covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming prior offline MARL methods and demonstrating robust generalisation across domains with many agents and long horizons. Finally, we introduce new datasets to push the limits of many-agent coordination in offline MARL, and demonstrate Oryx's superior ability to scale effectively in such settings. We will make all of our datasets, experimental data, and code available upon publication.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies
Authors:
Felix Chalumeau,
Daniel Rajaonarivonivelomanantsoa,
Ruan de Kock,
Claude Formanek,
Sasha Abramowitz,
Oumayma Mahjoub,
Wiem Khlifi,
Simon Du Toit,
Louay Ben Nessir,
Refiloe Shabe,
Arnol Fokam,
Siddarth Singh,
Ulrich Mbou Sob,
Arnu Pretorius
Abstract:
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are…
▽ More
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inf-marl.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Multi-Agent Reinforcement Learning with Selective State-Space Models
Authors:
Jemma Daniel,
Ruan de Kock,
Louay Ben Nessir,
Sasha Abramowitz,
Omayma Mahjoub,
Wiem Khlifi,
Claude Formanek,
Arnu Pretorius
Abstract:
The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to l…
▽ More
The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to larger inputs. This limitation restricts MAT's scalability in environments with many agents. Recently, State-Space Models (SSMs) have gained attention due to their computational efficiency, but their application in MARL remains unexplored. In this work, we investigate the use of Mamba, a recent SSM, in MARL and assess whether it can match the performance of MAT while providing significant improvements in efficiency. We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block. Extensive testing shows that our Multi-Agent Mamba (MAM) matches the performance of MAT across multiple standard multi-agent environments, while offering superior scalability to larger agent scenarios. This is significant for the MARL community, because it indicates that SSMs could replace Transformers without compromising performance, whilst also supporting more effective scaling to higher numbers of agents. Our project page is available at https://sites.google.com/view/multi-agent-mamba .
△ Less
Submitted 28 October, 2024; v1 submitted 25 October, 2024;
originally announced October 2024.
-
Sable: a Performant, Efficient and Scalable Sequence Model for MARL
Authors:
Omayma Mahjoub,
Sasha Abramowitz,
Ruan de Kock,
Wiem Khlifi,
Simon du Toit,
Jemma Daniel,
Louay Ben Nessir,
Louise Beyers,
Claude Formanek,
Liam Clark,
Arnu Pretorius
Abstract:
As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modeling approach to MARL. Sable works by adapting the ret…
▽ More
As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency, and (3) scalability. In this work, we introduce Sable, a performant, memory-efficient, and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage.
△ Less
Submitted 26 May, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Generalisable Agents for Neural Network Optimisation
Authors:
Kale-ab Tessera,
Callum Rhys Tilbury,
Sasha Abramowitz,
Ruan de Kock,
Omayma Mahjoub,
Benjamin Rosman,
Sara Hooker,
Arnu Pretorius
Abstract:
Optimising deep neural networks is a challenging task due to complex training dynamics, high computational requirements, and long training times. To address this difficulty, we propose the framework of Generalisable Agents for Neural Network Optimisation (GANNO) -- a multi-agent reinforcement learning (MARL) approach that learns to improve neural network optimisation by dynamically and responsivel…
▽ More
Optimising deep neural networks is a challenging task due to complex training dynamics, high computational requirements, and long training times. To address this difficulty, we propose the framework of Generalisable Agents for Neural Network Optimisation (GANNO) -- a multi-agent reinforcement learning (MARL) approach that learns to improve neural network optimisation by dynamically and responsively scheduling hyperparameters during training. GANNO utilises an agent per layer that observes localised network dynamics and accordingly takes actions to adjust these dynamics at a layerwise level to collectively improve global performance. In this paper, we use GANNO to control the layerwise learning rate and show that the framework can yield useful and responsive schedules that are competitive with handcrafted heuristics. Furthermore, GANNO is shown to perform robustly across a wide variety of unseen initial conditions, and can successfully generalise to harder problems than it was trained on. Our work presents an overview of the opportunities that this paradigm offers for training neural networks, along with key challenges that remain to be overcome.
△ Less
Submitted 22 March, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX
Authors:
Clément Bonnet,
Daniel Luo,
Donal Byrne,
Shikha Surana,
Sasha Abramowitz,
Paul Duckworth,
Vincent Coyette,
Laurence I. Midgley,
Elshadai Tegegn,
Tristan Kalloniatis,
Omayma Mahjoub,
Matthew Macfarlane,
Andries P. Smit,
Nathan Grinsztajn,
Raphael Boige,
Cemlyn N. Waters,
Mohamed A. Mimouni,
Ulrich A. Mbou Sob,
Ruan de Kock,
Siddarth Singh,
Daniel Furelos-Blanco,
Victor Le,
Arnu Pretorius,
Alexandre Laterre
Abstract:
Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments speci…
▽ More
Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments specifically designed to be fast, flexible, and scalable. Jumanji provides a suite of environments focusing on combinatorial problems frequently encountered in industry, as well as challenging general decision-making tasks. By leveraging the efficiency of JAX and hardware accelerators like GPUs and TPUs, Jumanji enables rapid iteration of research ideas and large-scale experimentation, ultimately empowering more capable agents. Unlike existing RL environment suites, Jumanji is highly customizable, allowing users to tailor the initial state distribution and problem complexity to their needs. Furthermore, we provide actor-critic baselines for each environment, accompanied by preliminary findings on scaling and generalization scenarios. Jumanji aims to set a new standard for speed, adaptability, and scalability of RL environments.
△ Less
Submitted 15 March, 2024; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Mava: a research library for distributed multi-agent reinforcement learning in JAX
Authors:
Ruan de Kock,
Omayma Mahjoub,
Sasha Abramowitz,
Wiem Khlifi,
Callum Rhys Tilbury,
Claude Formanek,
Andries Smit,
Arnu Pretorius
Abstract:
Multi-agent reinforcement learning (MARL) research is inherently computationally expensive and it is often difficult to obtain a sufficient number of experiment samples to test hypotheses and make robust statistical claims. Furthermore, MARL algorithms are typically complex in their design and can be tricky to implement correctly. These aspects of MARL present a difficult challenge when it comes t…
▽ More
Multi-agent reinforcement learning (MARL) research is inherently computationally expensive and it is often difficult to obtain a sufficient number of experiment samples to test hypotheses and make robust statistical claims. Furthermore, MARL algorithms are typically complex in their design and can be tricky to implement correctly. These aspects of MARL present a difficult challenge when it comes to creating useful software for advanced research. Our criteria for such software is that it should be simple enough to use to implement new ideas quickly, while at the same time be scalable and fast enough to test those ideas in a reasonable amount of time. In this preliminary technical report, we introduce Mava, a research library for MARL written purely in JAX, that aims to fulfill these criteria. We discuss the design and core features of Mava, and demonstrate its use and performance across a variety of environments. In particular, we show Mava's substantial speed advantage, with improvements of 10-100x compared to other popular MARL frameworks, while maintaining strong performance. This allows for researchers to test ideas in a few minutes instead of several hours. Finally, Mava forms part of an ecosystem of libraries that seamlessly integrate with each other to help facilitate advanced research in MARL. We hope Mava will benefit the community and help drive scientifically sound and statistically robust research in the field. The open-source repository for Mava is available at https://github.com/instadeepai/Mava.
△ Less
Submitted 15 December, 2023; v1 submitted 3 July, 2021;
originally announced July 2021.