-
Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems
Authors:
Julian Ruddick,
Glenn Ceusters,
Gilles Van Kriekinge,
Evgenii Genov,
Cedric De Cauwer,
Thierry Coosemans,
Maarten Messagie
Abstract:
Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, compa…
▽ More
Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, comparing against model predictive control and simple rule-based control benchmark. The experiments were conducted on the electrical installation of 4 reproductions of residential houses, which all have their own battery, photovoltaic and dynamic load system emulating a non-controllable electrical load and a controllable electric vehicle charger. The results show that the simple rules, TreeC, and model predictive control-based methods achieved similar costs, with a difference of only 0.6%. The reinforcement learning based method, still in its training phase, obtained a cost 25.5\% higher to the other methods. Additional simulations show that the costs can be further reduced by using a more representative training dataset for TreeC and addressing errors in the model predictive control implementation caused by its reliance on accurate data from various sources. The OptLayerPolicy safety layer allows safe online training of a reinforcement learning agent in the real-world, given an accurate constraint function formulation. The proposed safety layer method remains error-prone, nonetheless, it is found beneficial for all investigated methods. The TreeC method, which does require building a realistic simulation for training, exhibits the safest operational performance, exceeding the grid limit by only 27.1 Wh compared to 593.9 Wh for reinforcement learning.
△ Less
Submitted 25 November, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Balancing Forecast Accuracy and Switching Costs in Online Optimization of Energy Management Systems
Authors:
Evgenii Genov,
Julian Ruddick,
Christoph Bergmeir,
Majid Vafaeipour,
Thierry Coosemans,
Salvador Garcia,
Maarten Messagie
Abstract:
This study investigates the integration of forecasting and optimization in energy management systems, with a focus on the role of switching costs -- penalties incurred from frequent operational adjustments. We develop a theoretical and empirical framework to examine how forecast accuracy and stability interact with switching costs in online decision-making settings. Our analysis spans both determi…
▽ More
This study investigates the integration of forecasting and optimization in energy management systems, with a focus on the role of switching costs -- penalties incurred from frequent operational adjustments. We develop a theoretical and empirical framework to examine how forecast accuracy and stability interact with switching costs in online decision-making settings. Our analysis spans both deterministic and stochastic optimization approaches, using point and probabilistic forecasts. A novel metric for measuring temporal consistency in probabilistic forecasts is introduced, and the framework is validated in a real-world battery scheduling case based on the CityLearn 2022 challenge. Results show that switching costs significantly alter the trade-off between forecast accuracy and stability, and that more stable forecasts can reduce the performance loss due to switching. Contrary to common practice, the findings suggest that, under non-negligible switching costs, longer commitment periods may lead to better overall outcomes. These insights have practical implications for the design of intelligent, forecast-aware energy management systems.
△ Less
Submitted 15 April, 2025; v1 submitted 29 June, 2024;
originally announced July 2024.
-
An adaptive safety layer with hard constraints for safe reinforcement learning in multi-energy management systems
Authors:
Glenn Ceusters,
Muhammad Andy Putratama,
Rüdiger Franke,
Ann Nowé,
Maarten Messagie
Abstract:
Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a priori and not a complete model. The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can s…
▽ More
Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a priori and not a complete model. The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learnt, and modelling bias is kept to a minimum. However, even the constraint functions alone are not always trivial to accurately provide in advance, leading to potentially unsafe behaviour. In this paper, we present two novel advancements: (I) combining the OptLayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency and the possibility to formulate equality constraints. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more and new data becomes available so that better policies can be learnt. Both advancements keep the constraint formulation decoupled from the RL formulation, so new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. Although introducing surrogate functions into the optimisation problem requires special attention, we conclude that the newly presented GreyOptLayerPolicy method is the most advantageous.
△ Less
Submitted 6 November, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
TreeC: a method to generate interpretable energy management systems using a metaheuristic algorithm
Authors:
Julian Ruddick,
Luis Ramirez Camargo,
Muhammad Andy Putratama,
Maarten Messagie,
Thierry Coosemans
Abstract:
Energy management systems (EMS) have traditionally been implemented using rule-based control (RBC) and model predictive control (MPC) methods. However, recent research has explored the use of reinforcement learning (RL) as a promising alternative. This paper introduces TreeC, a machine learning method that utilizes the covariance matrix adaptation evolution strategy metaheuristic algorithm to gene…
▽ More
Energy management systems (EMS) have traditionally been implemented using rule-based control (RBC) and model predictive control (MPC) methods. However, recent research has explored the use of reinforcement learning (RL) as a promising alternative. This paper introduces TreeC, a machine learning method that utilizes the covariance matrix adaptation evolution strategy metaheuristic algorithm to generate an interpretable EMS modeled as a decision tree. Unlike RBC and MPC approaches, TreeC learns the decision strategy of the EMS based on historical data, adapting the control model to the controlled energy grid. The decision strategy is represented as a decision tree, providing interpretability compared to RL methods that often rely on black-box models like neural networks. TreeC is evaluated against MPC with perfect forecast and RL EMSs in two case studies taken from literature: an electric grid case and a household heating case. In the electric grid case, TreeC achieves an average energy loss and constraint violation score of 19.2, which is close to MPC and RL EMSs that achieve scores of 14.4 and 16.2 respectively. All three methods control the electric grid well especially when compared to the random EMS, which obtains an average score of 12 875. In the household heating case, TreeC performs similarly to MPC on the adjusted and averaged electricity cost and total discomfort (0.033 EUR/m$^2$ and 0.42 Kh for TreeC compared to 0.037 EUR/m$^2$ and 2.91 kH for MPC), while outperforming RL (0.266 EUR/m$^2$ and 24.41 Kh).
△ Less
Submitted 13 November, 2024; v1 submitted 17 April, 2023;
originally announced April 2023.
-
Safe reinforcement learning for multi-energy management systems with known constraint functions
Authors:
Glenn Ceusters,
Luis Ramirez Camargo,
Rüdiger Franke,
Ann Nowé,
Maarten Messagie
Abstract:
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potent…
▽ More
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint, rather than soft- and chance-constraint, satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). This without the need of solving a mathematical program, resulting in less computational power requirements and a more flexible constraint function formulation (no derivative information is required). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees.
△ Less
Submitted 1 September, 2022; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs
Authors:
Julian Ruddick,
Evgenii Genov,
Luis Ramirez Camargo,
Thierry Coosemans,
Maarten Messagie
Abstract:
This paper presents a solution to a predict then optimise problem which goal is to reduce the electricity cost of a university campus. The proposed methodology combines a multi-dimensional time series forecast and a novel approach to large-scale optimization. Gradient-boosting method is applied to forecast both generation and consumption time-series of the Monash university campus for the month of…
▽ More
This paper presents a solution to a predict then optimise problem which goal is to reduce the electricity cost of a university campus. The proposed methodology combines a multi-dimensional time series forecast and a novel approach to large-scale optimization. Gradient-boosting method is applied to forecast both generation and consumption time-series of the Monash university campus for the month of November 2020. For the consumption forecasts we employ log transformation to model trend and stabilize variance. Additional seasonality and trend features are added to the model inputs when applicable. The forecasts obtained are used as the base load for the schedule optimisation of university activities and battery usage. The goal of the optimisation is to minimize the electricity cost consisting of the price of electricity and the peak electricity tariff both altered by the load from class activities and battery use as well as the penalty of not scheduling some optional activities. The schedule of the class activities is obtained through evolutionary optimisation using the covariance matrix adaptation evolution strategy and the genetic algorithm. This schedule is then improved through local search by testing possible times for each activity one-by-one. The battery schedule is formulated as a mixed-integer programming problem and solved by the Gurobi solver. This method obtains the second lowest cost when evaluated against 6 other methods presented at an IEEE competition that all used mixed-integer programming and the Gurobi solver to schedule both the activities and the battery use. The code and data used for the paper are publicly available.
△ Less
Submitted 26 May, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Renewable energy communities: do they have a business case in Flanders?
Authors:
Alex Felice,
Lucija Rakocevic,
Leen Peters,
Maarten Messagie,
Thierry Coosemans,
Luis Ramirez Camargo
Abstract:
Renewable energy communities (RECs) are prominent initiatives to provide end consumers an active role in the energy sector, raise awareness on the importance of renewable energy (RE) technologies and increase their share in the energy system thus reducing greenhouse gas emissions. The economic viability of RECs though, depends on multiple interdependent factors that require careful examination for…
▽ More
Renewable energy communities (RECs) are prominent initiatives to provide end consumers an active role in the energy sector, raise awareness on the importance of renewable energy (RE) technologies and increase their share in the energy system thus reducing greenhouse gas emissions. The economic viability of RECs though, depends on multiple interdependent factors that require careful examination for each individual context. This study aims at investigating the impact of electricity tariffs, ratio of electrification of heating and transportation sectors, prices of RE technologies and storage systems, and internal electricity exchange prices on the annual cost for electricity provision of a REC. A mixed-integer linear model is developed to minimize energy provision costs for a representative REC in Flanders, Belgium. The results indicate that RECs have the potential to reduce these costs by 10 to 26% compared to business-as-usual. This cost reduction depends on the type of electricity tariffs and the level of uptake of flexible assets such as heat pumps and electric vehicles. The shift towards a higher power component in the electricity tariff makes electricity storage systems more attractive, which leads to higher electricity self-consumption. The introduction of flexible assets adds the possibility to shift demand when tariffs are lower and makes larger sizes of photovoltaic systems economically viable due to the increase in the total electricity demand. However, RECs cost reduction compared to individual smart-homes amounts to only 4% - 6% in the best cases. Uncertainties stemming from the regulation and the costs of setting up a REC may reduce the estimated benefits.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Model-predictive control and reinforcement learning in multi-energy system case studies
Authors:
Glenn Ceusters,
Román Cantú Rodríguez,
Alberte Bouso García,
Rüdiger Franke,
Geert Deconinck,
Lieve Helsen,
Ann Nowé,
Maarten Messagie,
Luis Ramirez Camargo
Abstract:
Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing…
▽ More
Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on- and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) - both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.
△ Less
Submitted 9 September, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.