-
Meta-World+: An Improved, Standardized, RL Benchmark
Authors:
Reginald McLean,
Evangelos Chatzaroulas,
Luc McCutcheon,
Frank Röder,
Tianhe Yu,
Zhanpeng He,
K. R. Zentner,
Ryan Julian,
J K Terry,
Isaac Woungang,
Nariman Farsad,
Pablo Samuel Castro
Abstract:
Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-Worl…
▽ More
Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills simultaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release a new open-source version of Meta-World (https://github.com/Farama-Foundation/Metaworld/) that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Multi-Task Reinforcement Learning Enables Parameter Scaling
Authors:
Reginald McLean,
Evangelos Chatzaroulas,
Jordan Terry,
Isaac Woungang,
Nariman Farsad,
Pablo Samuel Castro
Abstract:
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that g…
▽ More
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
△ Less
Submitted 12 March, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
Authors:
Moucheng Xu,
Evangelos Chatzaroulas,
Luc McCutcheon,
Abdul Ahad,
Hamzah Azeem,
Janusz Marecki,
Ammar Anwar
Abstract:
A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. Ho…
▽ More
A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOP, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.
△ Less
Submitted 20 October, 2024; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Deep Reinforcement Learning for Stabilization of Large-scale Probabilistic Boolean Networks
Authors:
Sotiris Moschoyiannis,
Evangelos Chatzaroulas,
Vytenis Sliogeris,
Yuhu Wu
Abstract:
The ability to direct a Probabilistic Boolean Network (PBN) to a desired state is important to applications such as targeted therapeutics in cancer biology. Reinforcement Learning (RL) has been proposed as a framework that solves a discrete-time optimal control problem cast as a Markov Decision Process. We focus on an integrative framework powered by a model-free deep RL method that can address di…
▽ More
The ability to direct a Probabilistic Boolean Network (PBN) to a desired state is important to applications such as targeted therapeutics in cancer biology. Reinforcement Learning (RL) has been proposed as a framework that solves a discrete-time optimal control problem cast as a Markov Decision Process. We focus on an integrative framework powered by a model-free deep RL method that can address different flavours of the control problem (e.g., with or without control inputs; attractor state or a subset of the state space as the target domain). The method is agnostic to the distribution of probabilities for the next state, hence it does not use the probability transition matrix. The time complexity is linear on the time steps, or interactions between the agent (deep RL) and the environment (PBN), during training. Indeed, we explore the scalability of the deep RL approach to (set) stabilization of large-scale PBNs and demonstrate successful control on large networks, including a metastatic melanoma PBN with 200 nodes.
△ Less
Submitted 25 October, 2022; v1 submitted 21 October, 2022;
originally announced October 2022.