Search | arXiv e-print repository

doi 10.1016/j.segan.2023.101202

An adaptive safety layer with hard constraints for safe reinforcement learning in multi-energy management systems

Authors: Glenn Ceusters, Muhammad Andy Putratama, Rüdiger Franke, Ann Nowé, Maarten Messagie

Abstract: Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a priori and not a complete model. The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can s… ▽ More Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a priori and not a complete model. The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learnt, and modelling bias is kept to a minimum. However, even the constraint functions alone are not always trivial to accurately provide in advance, leading to potentially unsafe behaviour. In this paper, we present two novel advancements: (I) combining the OptLayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency and the possibility to formulate equality constraints. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more and new data becomes available so that better policies can be learnt. Both advancements keep the constraint formulation decoupled from the RL formulation, so new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. Although introducing surrogate functions into the optimisation problem requires special attention, we conclude that the newly presented GreyOptLayerPolicy method is the most advantageous. △ Less

Submitted 6 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: post-print

arXiv:2207.03830 [pdf, other]

Safe reinforcement learning for multi-energy management systems with known constraint functions

Authors: Glenn Ceusters, Luis Ramirez Camargo, Rüdiger Franke, Ann Nowé, Maarten Messagie

Abstract: Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potent… ▽ More Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint, rather than soft- and chance-constraint, satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). This without the need of solving a mathematical program, resulting in less computational power requirements and a more flexible constraint function formulation (no derivative information is required). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees. △ Less

Submitted 1 September, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 26 pages, 14 figures

arXiv:2104.09785 [pdf, other]

Model-predictive control and reinforcement learning in multi-energy system case studies

Authors: Glenn Ceusters, Román Cantú Rodríguez, Alberte Bouso García, Rüdiger Franke, Geert Deconinck, Lieve Helsen, Ann Nowé, Maarten Messagie, Luis Ramirez Camargo

Abstract: Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing… ▽ More Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on- and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) - both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work. △ Less

Submitted 9 September, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: 43 pages, 29 figures

arXiv:1911.10121 [pdf, other]

Fleet Control using Coregionalized Gaussian Process Policy Iteration

Authors: Timothy Verstraeten, Pieter JK Libin, Ann Nowé

Abstract: In many settings, as for example wind farms, multiple machines are instantiated to perform the same task, which is called a fleet. The recent advances with respect to the Internet of Things allow control devices and/or machines to connect through cloud-based architectures in order to share information about their status and environment. Such an infrastructure allows seamless data sharing between f… ▽ More In many settings, as for example wind farms, multiple machines are instantiated to perform the same task, which is called a fleet. The recent advances with respect to the Internet of Things allow control devices and/or machines to connect through cloud-based architectures in order to share information about their status and environment. Such an infrastructure allows seamless data sharing between fleet members, which could greatly improve the sample-efficiency of reinforcement learning techniques. However in practice, these machines, while almost identical in design, have small discrepancies due to production errors or degradation, preventing control algorithms to simply aggregate and employ all fleet data. We propose a novel reinforcement learning method that learns to transfer knowledge between similar fleet members and creates member-specific dynamics models for control. Our algorithm uses Gaussian processes to establish cross-member covariances. This is significantly different from standard transfer learning methods, as the focus is not on sharing information over tasks, but rather over system specifications. We demonstrate our approach on two benchmarks and a realistic wind farm setting. Our method significantly outperforms two baseline approaches, namely individual learning and joint learning where all samples are aggregated, in terms of the median and variance of the results. △ Less

Submitted 22 November, 2019; originally announced November 2019.

arXiv:1903.11518 [pdf, other]

doi 10.1016/j.rser.2019.03.019

Fleetwide data-enabled reliability improvement of wind turbines

Authors: Timothy Verstraeten, Ann Nowe, Jonathan Keller, Yi Guo, Shuangwen Sheng, Jan Helsen

Abstract: Wind farms are an indispensable driver toward renewable and nonpolluting energy resources. However, as ideal sites are limited, placement in remote and challenging locations results in higher logistics costs and lower average wind speeds. Therefore, it is critical to increase the reliability of the turbines to reduce maintenance costs. Robust implementation requires a thorough understanding of the… ▽ More Wind farms are an indispensable driver toward renewable and nonpolluting energy resources. However, as ideal sites are limited, placement in remote and challenging locations results in higher logistics costs and lower average wind speeds. Therefore, it is critical to increase the reliability of the turbines to reduce maintenance costs. Robust implementation requires a thorough understanding of the loads subject to the turbine's control. Yet, such dynamically changing multidimensional loads are uncommon with other machinery, and generally underresearched. Therefore, a multitiered approach is proposed to investigate the load spectrum occurring in wind farms. Our approach relies on both fundamental research using controllable test rigs, as well as analyses of real-world loading conditions in high-frequency supervisory control and data acquisition data. A method is introduced to detect operational zones in wind farm data and link them with load distributions. Additionally, while focused research further investigates the load spectrum, a method is proposed that continuously optimizes the farm's control protocols without the need to fully understand the loads that occur. A case of gearbox failure is investigated based on a vast body of past experiments and suspect loads are identified. Starting from this evidence on the cause and effects of dynamic loads, the potential of our methods is shown by analyzing real-world farm loading conditions on a steady-state case of wake and developing a preventive row-based control protocol for a case of cascading emergency brakes induced by a storm. △ Less

Submitted 3 April, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: 24 pages, 8 figures

Journal ref: Renew Sustain Energy Rev 109 (2019) 428-437

Showing 1–5 of 5 results for author: Nowé, A