Search | arXiv e-print repository

doi 10.5555/3545946.3598718

Enhancing Reinforcement Learning Agents with Local Guides

Authors: Paul Daoudi, Bogdan Robu, Christophe Prieur, Ludovic Dos Santos, Merwan Barlier

Abstract: This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towar… ▽ More This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towards better actions. We evaluated our method on a set of classical Reinforcement Learning problems, including safety-critical systems where the agent cannot enter some areas at the risk of triggering catastrophic consequences. In all the proposed environments, our agent proved to be efficient at leveraging those policies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning stages. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Journal ref: AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

arXiv:2402.13654 [pdf, other]

Improving a Proportional Integral Controller with Reinforcement Learning on a Throttle Valve Benchmark

Authors: Paul Daoudi, Bojan Mavkov, Bogdan Robu, Christophe Prieur, Emmanuel Witrant, Merwan Barlier, Ludovic Dos Santos

Abstract: This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment. We start with a carefully tuned Proportional Integrator (PI) controller and exploit the recent advances in Reinforcement Learning (RL) with Guides to improve the closed-loop behavior by… ▽ More This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment. We start with a carefully tuned Proportional Integrator (PI) controller and exploit the recent advances in Reinforcement Learning (RL) with Guides to improve the closed-loop behavior by learning from the additional interactions with the valve. We test the proposed control method in various scenarios on three different valves, all highlighting the benefits of combining both PI and RL frameworks to improve control performance in non-linear stochastic systems. In all the experimental test cases, the resulting agent has a better sample efficiency than traditional RL agents and outperforms the PI controller. △ Less

Submitted 15 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Journal ref: 2024 IEEE Conference on Control Technology and Applications (CCTA)

arXiv:2312.15474 [pdf, other]

A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning

Authors: Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos

Abstract: Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the dynamics of the source environment, resulting in the discovery of policies that excel in this environment but fail to provide reasonable performance in the target one.… ▽ More Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the dynamics of the source environment, resulting in the discovery of policies that excel in this environment but fail to provide reasonable performance in the target one. In the few-shot framework, a limited number of transitions from the target environment are introduced to facilitate a more effective transfer. Addressing this challenge, we propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms. The proposed method introduces a penalty to regulate the trajectories generated by the source-trained policy. We evaluate our method across various environments representing diverse off-dynamics conditions, where access to the target environment is extremely limited. These experiments include high-dimensional systems relevant to real-world applications. Across most tested scenarios, our proposed method demonstrates performance improvements compared to existing baselines. △ Less

Submitted 15 July, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Journal ref: Proceedings of the the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

arXiv:2312.15458 [pdf]

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

Authors: Paul Daoudi, Mathias Formoso, Othman Gaizi, Achraf Azize, Evrard Garcelon

Abstract: A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on the performance of the exploratory policies. We study the problem of conservative exploration, where the learner must at least be able to guarantee its performanc… ▽ More A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on the performance of the exploratory policies. We study the problem of conservative exploration, where the learner must at least be able to guarantee its performance is at least as good as a baseline policy. We propose the first conservative provably efficient model-free algorithm for policy optimization in continuous finite-horizon problems. We leverage importance sampling techniques to counterfactually evaluate the conservative condition from the data self-generated by the algorithm. We derive a regret bound and show that (w.h.p.) the conservative constraint is never violated during learning. Finally, we leverage these insights to build a general schema for conservative exploration in DeepRL via off-policy policy evaluation techniques. We show empirically the effectiveness of our methods. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Showing 1–4 of 4 results for author: Daoudi, P