Search | arXiv e-print repository

doi 10.5555/3545946.3598718

Enhancing Reinforcement Learning Agents with Local Guides

Authors: Paul Daoudi, Bogdan Robu, Christophe Prieur, Ludovic Dos Santos, Merwan Barlier

Abstract: This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towar… ▽ More This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towards better actions. We evaluated our method on a set of classical Reinforcement Learning problems, including safety-critical systems where the agent cannot enter some areas at the risk of triggering catastrophic consequences. In all the proposed environments, our agent proved to be efficient at leveraging those policies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning stages. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Journal ref: AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

arXiv:2402.13654 [pdf, other]

Improving a Proportional Integral Controller with Reinforcement Learning on a Throttle Valve Benchmark

Authors: Paul Daoudi, Bojan Mavkov, Bogdan Robu, Christophe Prieur, Emmanuel Witrant, Merwan Barlier, Ludovic Dos Santos

Abstract: This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment. We start with a carefully tuned Proportional Integrator (PI) controller and exploit the recent advances in Reinforcement Learning (RL) with Guides to improve the closed-loop behavior by… ▽ More This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment. We start with a carefully tuned Proportional Integrator (PI) controller and exploit the recent advances in Reinforcement Learning (RL) with Guides to improve the closed-loop behavior by learning from the additional interactions with the valve. We test the proposed control method in various scenarios on three different valves, all highlighting the benefits of combining both PI and RL frameworks to improve control performance in non-linear stochastic systems. In all the experimental test cases, the resulting agent has a better sample efficiency than traditional RL agents and outperforms the PI controller. △ Less

Submitted 15 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Journal ref: 2024 IEEE Conference on Control Technology and Applications (CCTA)

arXiv:2312.15474 [pdf, other]

A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning

Authors: Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos

Abstract: Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the dynamics of the source environment, resulting in the discovery of policies that excel in this environment but fail to provide reasonable performance in the target one.… ▽ More Off-dynamics Reinforcement Learning (ODRL) seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics. In this context, traditional RL agents depend excessively on the dynamics of the source environment, resulting in the discovery of policies that excel in this environment but fail to provide reasonable performance in the target one. In the few-shot framework, a limited number of transitions from the target environment are introduced to facilitate a more effective transfer. Addressing this challenge, we propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms. The proposed method introduces a penalty to regulate the trajectories generated by the source-trained policy. We evaluate our method across various environments representing diverse off-dynamics conditions, where access to the target environment is extremely limited. These experiments include high-dimensional systems relevant to real-world applications. Across most tested scenarios, our proposed method demonstrates performance improvements compared to existing baselines. △ Less

Submitted 15 July, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Journal ref: Proceedings of the the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

arXiv:2210.11624 [pdf, other]

Sparse Dynamical Features generation, application to Parkinson's Disease diagnosis

Authors: Houssem Meghnoudj, Bogdan Robu, Mazen Alamir

Abstract: In this study we focus on the diagnosis of Parkinson's Disease (PD) based on electroencephalogram (EEG) signals. We propose a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to extract new demarcating features of the disease. The method was evaluated on a publicly available dataset containing EEG signals recorded during a 3-oddba… ▽ More In this study we focus on the diagnosis of Parkinson's Disease (PD) based on electroencephalogram (EEG) signals. We propose a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to extract new demarcating features of the disease. The method was evaluated on a publicly available dataset containing EEG signals recorded during a 3-oddball auditory task involving N = 50 subjects, of whom 25 suffer from PD. By extracting two features, and separating them with a straight line using a Linear Discriminant Analysis (LDA) classifier, we can separate the healthy from the unhealthy subjects with an accuracy of 90 % $(p < 0.03)$ using a single channel. By aggregating the information from three channels and making them vote, we obtain an accuracy of 94 %, a sensitivity of 96 % and a specificity of 92 %. The evaluation was carried out using a nested Leave-One-Out cross-validation procedure, thus preventing data leakage problems and giving a less biased evaluation. Several tests were carried out to assess the validity and robustness of our approach, including the test where we use only half the available data for training. Under this constraint, the model achieves an accuracy of 83.8 %. △ Less

Submitted 29 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 18 pages, 13 figures

arXiv:2103.10824 [pdf, other]

doi 10.1109/TDSC.2021.3063947

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Authors: Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen

Abstract: Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper,… ▽ More Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95% for IoT device attacks (i.e., +7%), up to 85.03% for cloud task failures (i.e., +14%) under 40% label noise, and for its extension, it can reach up to 77.51% for face recognition (i.e., +39%) under 30% label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: Published in IEEE Transactions on Dependable and Secure Computing. arXiv admin note: substantial text overlap with arXiv:1911.04383

arXiv:2003.09503 [pdf, other]

doi 10.1109/LCSYS.2020.2981984

Event-Based Control for Online Training of Neural Networks

Authors: Zilong Zhao, Sophie Cerf, Bogdan Robu, Nicolas Marchand

Abstract: Convolutional Neural Network (CNN) has become the most used method for image classification tasks. During its training the learning rate and the gradient are two key factors to tune for influencing the convergence speed of the model. Usual learning rate strategies are time-based i.e. monotonous decay over time. Recent state-of-the-art techniques focus on adaptive gradient algorithms i.e. Adam and… ▽ More Convolutional Neural Network (CNN) has become the most used method for image classification tasks. During its training the learning rate and the gradient are two key factors to tune for influencing the convergence speed of the model. Usual learning rate strategies are time-based i.e. monotonous decay over time. Recent state-of-the-art techniques focus on adaptive gradient algorithms i.e. Adam and its versions. In this paper we consider an online learning scenario and we propose two Event-Based control loops to adjust the learning rate of a classical algorithm E (Exponential)/PD (Proportional Derivative)-Control. The first Event-Based control loop will be implemented to prevent sudden drop of the learning rate when the model is approaching the optimum. The second Event-Based control loop will decide, based on the learning speed, when to switch to the next data batch. Experimental evaluationis provided using two state-of-the-art machine learning image datasets (CIFAR-10 and CIFAR-100). Results show the Event-Based E/PD is better than the original algorithm (higher final accuracy, lower final loss value), and the Double-Event-BasedE/PD can accelerate the training process, save up to 67% training time compared to state-of-the-art algorithms and even result in better performance. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:1911.07710 [pdf, other]

Feedback Control for Online Training of Neural Networks

Authors: Zilong Zhao, Sophie Cerf, Bogdan Robu, Nicolas Marchand

Abstract: Convolutional neural networks (CNNs) are commonly used for image classification tasks, raising the challenge of their application on data flows. During their training, adaptation is often performed by tuning the learning rate. Usual learning rate strategies are time-based i.e. monotonously decreasing. In this paper, we advocate switching to a performance-based adaptation, in order to improve the l… ▽ More Convolutional neural networks (CNNs) are commonly used for image classification tasks, raising the challenge of their application on data flows. During their training, adaptation is often performed by tuning the learning rate. Usual learning rate strategies are time-based i.e. monotonously decreasing. In this paper, we advocate switching to a performance-based adaptation, in order to improve the learning efficiency. We present E (Exponential)/PD (Proportional Derivative)-Control, a conditional learning rate strategy that combines a feedback PD controller based on the CNN loss function, with an exponential control signal to smartly boost the learning and adapt the PD parameters. Stability proof is provided as well as an experimental evaluation using two state of the art image datasets (CIFAR-10 and Fashion-MNIST). Results show better performances than the related works (faster network accuracy growth reaching higher levels) and robustness of the E/PD-Control regarding its parametrization. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.04383 [pdf, other]

RAD: On-line Anomaly Detection for Highly Unreliable Data

Authors: Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen

Abstract: Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper,… ▽ More Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we present a two-layer on-line learning framework for robust anomaly detection (RAD) in the presence of unreliable anomaly labels, where the first layer is to filter out the suspicious data, and the second layer detects the anomaly patterns from the remaining data. To adapt to the on-line nature of anomaly detection, we extend RAD with additional features of repetitively cleaning, conflicting opinions of classifiers, and oracle knowledge. We on-line learn from the incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, (iii) recognising 20 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98% for IoT device attacks (i.e., +11%), up to 84% for cloud task failures (i.e., +20%) under 40% noise, and up to 74% for face recognition (i.e., +28%) under 30% noisy labels. The proposed RAD is general and can be applied to different anomaly detection algorithms. △ Less

Submitted 11 November, 2019; originally announced November 2019.

Showing 1–8 of 8 results for author: Robu, B