-
Shielded Reinforcement Learning for Hybrid Systems
Authors:
Asger Horn Brorholt,
Peter Gjøl Jensen,
Kim Guldstrand Larsen,
Florian Lorber,
Christian Schilling
Abstract:
Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system's state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety t…
▽ More
Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system's state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and UPPAAL STRATEGO. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Methods for Efficient Unfolding of Colored Petri Nets
Authors:
Alexander Bilgram,
Peter G. Jensen,
Thomas Pedersen,
Jiri Srba,
Peter H. Taankvist
Abstract:
Colored Petri nets offer a compact and user friendly representation of the traditional P/T nets and colored nets with finite color ranges can be unfolded into the underlying P/T nets, however, at the expense of an exponential explosion in size. We present two novel techniques based on static analysis in order to reduce the size of unfolded colored nets. The first method identifies colors that beha…
▽ More
Colored Petri nets offer a compact and user friendly representation of the traditional P/T nets and colored nets with finite color ranges can be unfolded into the underlying P/T nets, however, at the expense of an exponential explosion in size. We present two novel techniques based on static analysis in order to reduce the size of unfolded colored nets. The first method identifies colors that behave equivalently and groups them into equivalence classes, potentially reducing the number of used colors. The second method overapproximates the sets of colors that can appear in places and excludes colors that can never be present in a given place. Both methods are complementary and the combined approach allows us to significantly reduce the size of multiple colored Petri nets from the Model Checking Contest benchmark. We compare the performance of our unfolder with state-of-the-art techniques implemented in the tools MCC, Spike and ITS-Tools, and while our approach is competitive w.r.t. unfolding time, it also outperforms the existing approaches both in the size of unfolded nets as well as in the number of answered model checking queries from the 2021 Model Checking Contest.
△ Less
Submitted 11 October, 2023; v1 submitted 12 April, 2022;
originally announced April 2022.
-
Verification and Parameter Synthesis for Real-Time Programs using Refinement of Trace Abstraction
Authors:
Franck Cassez,
Peter Gjøl Jensen,
Kim Guldstrand Larsen
Abstract:
We address the safety verification and synthesis problems for real-time systems. We introduce real-time programs that are made of instructions that can perform assignments to discrete and real-valued variables. They are general enough to capture interesting classes of timed systems such as timed automata, stopwatch automata, time(d) Petri nets and hybrid automata.
We propose a semi-algorithm usi…
▽ More
We address the safety verification and synthesis problems for real-time systems. We introduce real-time programs that are made of instructions that can perform assignments to discrete and real-valued variables. They are general enough to capture interesting classes of timed systems such as timed automata, stopwatch automata, time(d) Petri nets and hybrid automata.
We propose a semi-algorithm using refinement of trace abstractions to solve both the reachability verification problem and the parameter synthesis problem for real-time programs.
All of the algorithms proposed have been implemented and we have conducted a series of experiments, comparing the performance of our new approach to state-of-the-art tools in classical reachability, robustness analysis and parameter synthesis for timed systems. We show that our new method provides solutions to problems which are unsolvable by the current state-of-the-art tools.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
It's Time to Play Safe: Shield Synthesis for Timed Systems
Authors:
Roderick Bloem,
Peter Gjøl Jensen,
Bettina Könighofer,
Kim Guldstrand Larsen,
Florian Lorber,
Alexander Palmisano
Abstract:
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before…
▽ More
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before the system and provides a set of safe outputs. This set restricts the choices of the system. A timed post-shield is implemented after the system. It monitors the system and corrects the system's output only if necessary. We further extend the timed post-shield construction to provide a guarantee on the recovery phase, i.e., the time between a specification violation and the point at which full control can be handed back to the system. In our experimental results, we use timed post-shields to ensure the safety in a reinforcement learning setting for controlling a platoon of cars, during the learning and execution phase, and study the effect.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Approximating Euclidean by Imprecise Markov Decision Processes
Authors:
Manfred Jaeger,
Giorgio Bacci,
Giovanni Bacci,
Kim Guldstrand Larsen,
Peter Gjøl Jensen
Abstract:
Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In this paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated…
▽ More
Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In this paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations induced by increasingly fine partitions of the continuous state space. We show that for cost functions over finite time horizons the approximations become arbitrarily precise. Second, we use imprecise Markov decision process approximations as a tool to analyse and validate cost functions and strategies obtained by reinforcement learning. We find that, on the one hand, our new theoretical results validate basic design choices of a previously proposed reinforcement learning approach. On the other hand, the imprecise Markov decision process approximations reveal some inaccuracies in the learned cost functions.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Stubborn Set Reduction for Two-Player Reachability Games
Authors:
Frederik Meyer Bønneland,
Peter Gjøl Jensen,
Kim Guldstrand Larsen,
Marco Muñiz,
Jiří Srba
Abstract:
Partial order reductions have been successfully applied to model checking of concurrent systems and practical applications of the technique show nontrivial reduction in the size of the explored state space. We present a theory of partial order reduction based on stubborn sets in the game-theoretical setting of 2-player games with reachability objectives. Our stubborn reduction allows us to prune t…
▽ More
Partial order reductions have been successfully applied to model checking of concurrent systems and practical applications of the technique show nontrivial reduction in the size of the explored state space. We present a theory of partial order reduction based on stubborn sets in the game-theoretical setting of 2-player games with reachability objectives. Our stubborn reduction allows us to prune the interleaving behaviour of both players in the game, and we formally prove its correctness on the class of games played on general labelled transition systems. We then instantiate the framework to the class of weighted Petri net games with inhibitor arcs and provide its efficient implementation in the model checker TAPAAL. Finally, we evaluate our stubborn reduction on several case studies and demonstrate its efficiency.
△ Less
Submitted 17 March, 2021; v1 submitted 20 December, 2019;
originally announced December 2019.