-
Fairness Shields: Safeguarding against Biased Decision Makers
Authors:
Filip Cano,
Thomas A. Henzinger,
Bettina Könighofer,
Konstantin Kueffner,
Kaushik Mallik
Abstract:
As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision seque…
▽ More
As AI-based decision-makers increasingly influence human lives, it is a growing concern that their decisions are often unfair or biased with respect to people's sensitive attributes, such as gender and race. Most existing bias prevention measures provide probabilistic fairness guarantees in the long run, and it is possible that the decisions are biased on specific instances of short decision sequences. We introduce fairness shielding, where a symbolic decision-maker -- the fairness shield -- continuously monitors the sequence of decisions of another deployed black-box decision-maker, and makes interventions so that a given fairness criterion is met while the total intervention costs are minimized. We present four different algorithms for computing fairness shields, among which one guarantees fairness over fixed horizons, and three guarantee fairness periodically after fixed intervals. Given a distribution over future decisions and their intervention costs, our algorithms solve different instances of bounded-horizon optimal control problems with different levels of computational costs and optimality guarantees. Our empirical evaluation demonstrates the effectiveness of these shields in ensuring fairness while maintaining cost efficiency across various scenarios.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Test Where Decisions Matter: Importance-driven Testing for Deep Reinforcement Learning
Authors:
Stefan Pranger,
Hana Chockler,
Martin Tappler,
Bettina Könighofer
Abstract:
In many Deep Reinforcement Learning (RL) problems, decisions in a trained policy vary in significance for the expected safety and performance of the policy. Since RL policies are very complex, testing efforts should concentrate on states in which the agent's decisions have the highest impact on the expected outcome. In this paper, we propose a novel model-based method to rigorously compute a ranki…
▽ More
In many Deep Reinforcement Learning (RL) problems, decisions in a trained policy vary in significance for the expected safety and performance of the policy. Since RL policies are very complex, testing efforts should concentrate on states in which the agent's decisions have the highest impact on the expected outcome. In this paper, we propose a novel model-based method to rigorously compute a ranking of state importance across the entire state space. We then focus our testing efforts on the highest-ranked states. In this paper, we focus on testing for safety. However, the proposed methods can be easily adapted to test for performance. In each iteration, our testing framework computes optimistic and pessimistic safety estimates. These estimates provide lower and upper bounds on the expected outcomes of the policy execution across all modeled states in the state space. Our approach divides the state space into safe and unsafe regions upon convergence, providing clear insights into the policy's weaknesses. Two important properties characterize our approach. (1) Optimal Test-Case Selection: At any time in the testing process, our approach evaluates the policy in the states that are most critical for safety. (2) Guaranteed Safety: Our approach can provide formal verification guarantees over the entire state space by sampling only a fraction of the policy. Any safety properties assured by the pessimistic estimate are formally proven to hold for the policy. We provide a detailed evaluation of our framework on several examples, showing that our method discovers unsafe policy behavior with low testing effort.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Tools at the Frontiers of Quantitative Verification
Authors:
Roman Andriushchenko,
Alexander Bork,
Carlos E. Budde,
Milan Češka,
Kush Grover,
Ernst Moritz Hahn,
Arnd Hartmanns,
Bryant Israelsen,
Nils Jansen,
Joshua Jeppson,
Sebastian Junges,
Maximilian A. Köhl,
Bettina Könighofer,
Jan Křetínský,
Tobias Meggendorfer,
David Parker,
Stefan Pranger,
Tim Quatmann,
Enno Ruijters,
Landon Taylor,
Matthias Volk,
Maximilian Weininger,
Zhen Zhang
Abstract:
The analysis of formal models that include quantitative aspects such as timing or probabilistic choices is performed by quantitative verification tools. Broad and mature tool support is available for computing basic properties such as expected rewards on basic models such as Markov chains. Previous editions of QComp, the comparison of tools for the analysis of quantitative formal models, focused o…
▽ More
The analysis of formal models that include quantitative aspects such as timing or probabilistic choices is performed by quantitative verification tools. Broad and mature tool support is available for computing basic properties such as expected rewards on basic models such as Markov chains. Previous editions of QComp, the comparison of tools for the analysis of quantitative formal models, focused on this setting. Many application scenarios, however, require more advanced property types such as LTL and parameter synthesis queries as well as advanced models like stochastic games and partially observable MDPs. For these, tool support is in its infancy today. This paper presents the outcomes of QComp 2023: a survey of the state of the art in quantitative verification tool support for advanced property types and models. With tools ranging from first research prototypes to well-supported integrations into established toolsets, this report highlights today's active areas and tomorrow's challenges in tool-focused research for quantitative verification.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Safety Shielding under Delayed Observation
Authors:
Filip Cano Córdoba,
Alexander Palmisano,
Martin Fränzle,
Roderick Bloem,
Bettina Könighofer
Abstract:
Agents operating in physical environments need to be able to handle delays in the input and output signals since neither data transmission nor sensing or actuating the environment are instantaneous. Shields are correct-by-construction runtime enforcers that guarantee safe execution by correcting any action that may cause a violation of a formal safety specification. Besides providing safety guaran…
▽ More
Agents operating in physical environments need to be able to handle delays in the input and output signals since neither data transmission nor sensing or actuating the environment are instantaneous. Shields are correct-by-construction runtime enforcers that guarantee safe execution by correcting any action that may cause a violation of a formal safety specification. Besides providing safety guarantees, shields should interfere minimally with the agent. Therefore, shields should pick the safe corrective actions in such a way that future interferences are most likely minimized. Current shielding approaches do not consider possible delays in the input signals in their safety analyses. In this paper, we address this issue. We propose synthesis algorithms to compute \emph{delay-resilient shields} that guarantee safety under worst-case assumptions on the delays of the input signals. We also introduce novel heuristics for deciding between multiple corrective actions, designed to minimize future shield interferences caused by delays. As a further contribution, we present the first integration of shields in a realistic driving simulator. We implemented our delayed shields in the driving simulator \textsc{Carla}. We shield potentially unsafe autonomous driving agents in different safety-critical scenarios and show the effect of delays on the safety analysis.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Analyzing Intentional Behavior in Autonomous Agents under Uncertainty
Authors:
Filip Cano Córdoba,
Samuel Judson,
Timos Antonopoulos,
Katrine Bjørner,
Nicholas Shoemaker,
Scott J. Shapiro,
Ruzica Piskac,
Bettina Könighofer
Abstract:
Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely…
▽ More
Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between 'intentional' and 'accidental' traffic collisions.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Learning Environment Models with Continuous Stochastic Dynamics
Authors:
Martin Tappler,
Edi Muškardin,
Bernhard K. Aichernig,
Bettina Könighofer
Abstract:
Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. Howeve…
▽ More
Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. However, for most control problems, automata learning is not scalable enough to learn a useful model. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics.
The core of the scalability of our method lies in the computation of an abstract state-space representation, by applying dimensionality reduction and clustering on the observed environmental state space. The stochastic transitions are learned via passive automata learning from observed interactions of the agent and the environment. In an iterative model-based RL process, we sample additional trajectories to learn an accurate environment model in the form of a discrete-state Markov decision process (MDP). We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot. Our results show that the learned models are so precise that they enable the computation of policies solving the respective control tasks. Yet the models are more concise and more general than neural-network-based policies and by using MDPs we benefit from a wealth of tools available for analyzing them. When solving the task of LunarLander, the learned model even achieved similar or higher rewards than deep RL policies learned with stable-baselines3.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
'Put the Car on the Stand': SMT-based Oracles for Investigating Decisions
Authors:
Samuel Judson,
Matthew Elacqua,
Filip Cano,
Timos Antonopoulos,
Bettina Könighofer,
Scott J. Shapiro,
Ruzica Piskac
Abstract:
Principled accountability in the aftermath of harms is essential to the trustworthy design and governance of algorithmic decision making. Legal theory offers a paramount method for assessing culpability: putting the agent 'on the stand' to subject their actions and intentions to cross-examination. We show that under minimal assumptions automated reasoning can rigorously interrogate algorithmic beh…
▽ More
Principled accountability in the aftermath of harms is essential to the trustworthy design and governance of algorithmic decision making. Legal theory offers a paramount method for assessing culpability: putting the agent 'on the stand' to subject their actions and intentions to cross-examination. We show that under minimal assumptions automated reasoning can rigorously interrogate algorithmic behaviors as in the adversarial process of legal fact finding. We model accountability processes, such as trials or review boards, as Counterfactual-Guided Logic Exploration and Abstraction Refinement (CLEAR) loops. We use the formal methods of symbolic execution and satisfiability modulo theories (SMT) solving to discharge queries about agent behavior in factual and counterfactual scenarios, as adaptively formulated by a human investigator. In order to do so, for a decision algorithm $\mathcal{A}$ we use symbolic execution to represent its logic as a statement $Π$ in the decidable theory $\texttt{QF_FPBV}$. We implement our framework and demonstrate its utility on an illustrative car crash scenario.
△ Less
Submitted 29 January, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Online Shielding for Reinforcement Learning
Authors:
Bettina Könighofer,
Julian Rudolf,
Alexander Palmisano,
Martin Tappler,
Roderick Bloem
Abstract:
Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given and many safety vi…
▽ More
Besides the recent impressive results on reinforcement learning (RL), safety is still one of the major research challenges in RL. RL is a machine-learning approach to determine near-optimal policies in Markov decision processes (MDPs). In this paper, we consider the setting where the safety-relevant fragment of the MDP together with a temporal logic safety specification is given and many safety violations can be avoided by planning ahead a short time into the future. We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. For any action, the shield computes the maximal probability to not violate the safety specification within the next $k$ steps when executing this action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent. Existing offline shielding approaches compute exhaustively the safety of all state-action combinations ahead of time, resulting in huge computation times and large memory consumption. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our approach is well suited for high-level planning problems where the time between decisions can be used for safety computations and it is sustainable for the agent to wait until these computations are finished. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game represents a high-level planning problem that requires fast decisions and the multiplayer setting induces a large state space, which is computationally expensive to analyse exhaustively.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Automata Learning meets Shielding
Authors:
Martin Tappler,
Stefan Pranger,
Bettina Könighofer,
Edi Muškardin,
Roderick Bloem,
Kim Larsen
Abstract:
Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing th…
▽ More
Safety is still one of the major research challenges in reinforcement learning (RL). In this paper, we address the problem of how to avoid safety violations of RL agents during exploration in probabilistic and partially unknown environments. Our approach combines automata learning for Markov Decision Processes (MDPs) and shield synthesis in an iterative approach. Initially, the MDP representing the environment is unknown. The agent starts exploring the environment and collects traces. From the collected traces, we passively learn MDPs that abstractly represent the safety-relevant aspects of the environment. Given a learned MDP and a safety specification, we construct a shield. For each state-action pair within a learned MDP, the shield computes exact probabilities on how likely it is that executing the action results in violating the specification from the current state within the next $k$ steps. After the shield is constructed, the shield is used during runtime and blocks any actions that induce a too large risk from the agent. The shielded agent continues to explore the environment and collects new data on the environment. Iteratively, we use the collected data to learn new MDPs with higher accuracy, resulting in turn in shields able to prevent more safety violations. We implemented our approach and present a detailed case study of a Q-learning agent exploring slippery Gridworlds. In our experiments, we show that as the agent explores more and more of the environment during training, the improved learned models lead to shields that are able to prevent many safety violations.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Correct-by-Construction Runtime Enforcement in AI -- A Survey
Authors:
Bettina Könighofer,
Roderick Bloem,
Rüdiger Ehlers,
Christian Pek
Abstract:
Runtime enforcement refers to the theories, techniques, and tools for enforcing correct behavior with respect to a formal specification of systems at runtime. In this paper, we are interested in techniques for constructing runtime enforcers for the concrete application domain of enforcing safety in AI. We discuss how safety is traditionally handled in the field of AI and how more formal guarantees…
▽ More
Runtime enforcement refers to the theories, techniques, and tools for enforcing correct behavior with respect to a formal specification of systems at runtime. In this paper, we are interested in techniques for constructing runtime enforcers for the concrete application domain of enforcing safety in AI. We discuss how safety is traditionally handled in the field of AI and how more formal guarantees on the safety of a self-learning agent can be given by integrating a runtime enforcer. We survey a selection of work on such enforcers, where we distinguish between approaches for discrete and continuous action spaces. The purpose of this paper is to foster a better understanding of advantages and limitations of different enforcement techniques, focusing on the specific challenges that arise due to their application in AI. Finally, we present some open challenges and avenues for future work.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Search-Based Testing of Reinforcement Learning
Authors:
Martin Tappler,
Filip Cano Córdoba,
Bernhard K. Aichernig,
Bettina Könighofer
Abstract:
Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For s…
▽ More
Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For safety testing, our framework utilizes a search algorithm that searches for a reference trace that solves the RL task. The backtracking states of the search, called boundary states, pose safety-critical situations. We create safety test-suites that evaluate how well the RL agent escapes safety-critical situations near these boundary states. For robust performance testing, we create a diverse set of traces via fuzz testing. These fuzz traces are used to bring the agent into a wide variety of potentially unknown states from which the average performance of the agent is compared to the average performance of the fuzz traces. We apply our search-based testing approach on RL for Nintendo's Super Mario Bros.
△ Less
Submitted 14 May, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
TEMPEST -- Synthesis Tool for Reactive Systems and Shields in Probabilistic Environments
Authors:
Stefan Pranger,
Bettina Könighofer,
Lukas Posch,
Roderick Bloem
Abstract:
We present Tempest, a synthesis tool to automatically create correct-by-construction reactive systems and shields from qualitative or quantitative specifications in probabilistic environments. A shield is a special type of reactive system used for run-time enforcement; i.e., a shield enforces a given qualitative or quantitative specification of a running system while interfering with its operation…
▽ More
We present Tempest, a synthesis tool to automatically create correct-by-construction reactive systems and shields from qualitative or quantitative specifications in probabilistic environments. A shield is a special type of reactive system used for run-time enforcement; i.e., a shield enforces a given qualitative or quantitative specification of a running system while interfering with its operation as little as possible. Shields that enforce a qualitative or quantitative specification are called safety-shields or optimal-shields, respectively. Safety-shields can be implemented as pre-shields or as post-shields, optimal-shields are implemented as post-shields. Pre-shields are placed before the system and restrict the choices of the system. Post-shields are implemented after the system and are able to overwrite the system's output. Tempest is based on the probabilistic model checker Storm, adding model checking algorithms for stochastic games with safety and mean-payoff objectives. To the best of our knowledge, Tempest is the only synthesis tool able to solve 2-1/2-player games with mean-payoff objectives without restrictions on the state space. Furthermore, Tempest adds the functionality to synthesize safe and optimal strategies that implement reactive systems and shields
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Online Shielding for Stochastic Systems
Authors:
Bettina Könighofer,
Julian Rudolf,
Alexander Palmisano,
Martin Tappler,
Roderick Bloem
Abstract:
In this paper, we propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions that are unsafe with respect to a temporal logic specification from the agent. Our main contribution is a new synthesis algorithm for computing the s…
▽ More
In this paper, we propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions that are unsafe with respect to a temporal logic specification from the agent. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge offline computation times, large memory consumption, and significant delays at run-time due to the look-ups in a huge database. The intuition behind online shielding is to compute during run-time the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behavior. For our evaluation, we selected a 2-player version of the classical computer game SNAKE. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.
△ Less
Submitted 17 December, 2020;
originally announced December 2020.
-
Adaptive Shielding under Uncertainty
Authors:
Stefan Pranger,
Bettina Könighofer,
Martin Tappler,
Martin Deixelberger,
Nils Jansen,
Roderick Bloem
Abstract:
This paper targets control problems that exhibit specific safety and performance requirements. In particular, the aim is to ensure that an agent, operating under uncertainty, will at runtime strictly adhere to such requirements. Previous works create so-called shields that correct an existing controller for the agent if it is about to take unbearable safety risks. However, so far, shields do not c…
▽ More
This paper targets control problems that exhibit specific safety and performance requirements. In particular, the aim is to ensure that an agent, operating under uncertainty, will at runtime strictly adhere to such requirements. Previous works create so-called shields that correct an existing controller for the agent if it is about to take unbearable safety risks. However, so far, shields do not consider that an environment may not be fully known in advance and may evolve for complex control and learning tasks. We propose a new method for the efficient computation of a shield that is adaptive to a changing environment. In particular, we base our method on problems that are sufficiently captured by potentially infinite Markov decision processes (MDP) and quantitative specifications such as mean payoff objectives. The shield is independent of the controller, which may, for instance, take the form of a high-performing reinforcement learning agent. At runtime, our method builds an internal abstract representation of the MDP and constantly adapts this abstraction and the shield based on observations from the environment. We showcase the applicability of our method via an urban traffic control problem.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
It's Time to Play Safe: Shield Synthesis for Timed Systems
Authors:
Roderick Bloem,
Peter Gjøl Jensen,
Bettina Könighofer,
Kim Guldstrand Larsen,
Florian Lorber,
Alexander Palmisano
Abstract:
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before…
▽ More
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before the system and provides a set of safe outputs. This set restricts the choices of the system. A timed post-shield is implemented after the system. It monitors the system and corrects the system's output only if necessary. We further extend the timed post-shield construction to provide a guarantee on the recovery phase, i.e., the time between a specification violation and the point at which full control can be handed back to the system. In our experimental results, we use timed post-shields to ensure the safety in a reinforcement learning setting for controlling a platoon of cars, during the learning and execution phase, and study the effect.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Synthesis of Admissible Shields
Authors:
Laura Humphrey,
Bettina Könighofer,
Robert Könighofer,
Ufuk Topcu
Abstract:
Shield synthesis is an approach to enforce a set of safety-critical properties of a reactive system at runtime. A shield monitors the system and corrects any erroneous output values instantaneously. The shield deviates from the given outputs as little as it can and recovers to hand back control to the system as soon as possible. This paper takes its inspiration from a case study on mission plannin…
▽ More
Shield synthesis is an approach to enforce a set of safety-critical properties of a reactive system at runtime. A shield monitors the system and corrects any erroneous output values instantaneously. The shield deviates from the given outputs as little as it can and recovers to hand back control to the system as soon as possible. This paper takes its inspiration from a case study on mission planning for unmanned aerial vehicles (UAVs) in which k-stabilizing shields, which guarantee recovery in a finite time, could not be constructed. We introduce the notion of admissible shields, which improves k-stabilizing shields in two ways: (1) whereas k-stabilizing shields take an adversarial view on the system, admissible shields take a collaborative view. That is, if there is no shield that guarantees recovery within k steps regardless of system behavior, the admissible shield will attempt to work with the system to recover as soon as possible. (2) Admissible shields can handle system failures during the recovery phase. In our experimental results we show that for UAVs, we can generate admissible shields, even when k-stabilizing shields do not exist.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Safe Reinforcement Learning via Probabilistic Shields
Authors:
Nils Jansen,
Bettina Könighofer,
Sebastian Junges,
Alexandru C. Serban,
Roderick Bloem
Abstract:
This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, R…
▽ More
This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.
△ Less
Submitted 25 November, 2019; v1 submitted 16 July, 2018;
originally announced July 2018.
-
Safe Reinforcement Learning via Shielding
Authors:
Mohammed Alshiekh,
Roderick Bloem,
Ruediger Ehlers,
Bettina Könighofer,
Scott Niekum,
Ufuk Topcu
Abstract:
Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system…
▽ More
Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.
△ Less
Submitted 3 September, 2017; v1 submitted 29 August, 2017;
originally announced August 2017.
-
Shield Synthesis: Runtime Enforcement for Reactive Systems
Authors:
Roderick Bloem,
Bettina Koenighofer,
Robert Koenighofer,
Chao Wang
Abstract:
Scalability issues may prevent users from verifying critical properties of a complex hardware design. In this situation, we propose to synthesize a "safety shield" that is attached to the design to enforce the properties at run time. Shield synthesis can succeed where model checking and reactive synthesis fail, because it only considers a small set of critical properties, as opposed to the complex…
▽ More
Scalability issues may prevent users from verifying critical properties of a complex hardware design. In this situation, we propose to synthesize a "safety shield" that is attached to the design to enforce the properties at run time. Shield synthesis can succeed where model checking and reactive synthesis fail, because it only considers a small set of critical properties, as opposed to the complex design, or the complete specification in the case of reactive synthesis. The shield continuously monitors the input/output of the design and corrects its erroneous output only if necessary, and as little as possible, so other non-critical properties are likely to be retained. Although runtime enforcement has been studied in other domains such as action systems, reactive systems pose unique challenges where the shield must act without delay. We thus present the first shield synthesis solution for reactive hardware systems and report our experimental results. This is an extended version of [5], featuring an additional appendix.
△ Less
Submitted 16 January, 2015; v1 submitted 12 January, 2015;
originally announced January 2015.
-
Synthesizing Multiple Boolean Functions using Interpolation on a Single Proof
Authors:
Georg Hofferek,
Ashutosh Gupta,
Bettina Könighofer,
Jie-Hong Roland Jiang,
Roderick Bloem
Abstract:
It is often difficult to correctly implement a Boolean controller for a complex system, especially when concurrency is involved. Yet, it may be easy to formally specify a controller. For instance, for a pipelined processor it suffices to state that the visible behavior of the pipelined system should be identical to a non-pipelined reference system (Burch-Dill paradigm). We present a novel procedur…
▽ More
It is often difficult to correctly implement a Boolean controller for a complex system, especially when concurrency is involved. Yet, it may be easy to formally specify a controller. For instance, for a pipelined processor it suffices to state that the visible behavior of the pipelined system should be identical to a non-pipelined reference system (Burch-Dill paradigm). We present a novel procedure to efficiently synthesize multiple Boolean control signals from a specification given as a quantified first-order formula (with a specific quantifier structure). Our approach uses uninterpreted functions to abstract details of the design. We construct an unsatisfiable SMT formula from the given specification. Then, from just one proof of unsatisfiability, we use a variant of Craig interpolation to compute multiple coordinated interpolants that implement the Boolean control signals. Our method avoids iterative learning and back-substitution of the control functions. We applied our approach to synthesize a controller for a simple two-stage pipelined processor, and present first experimental results.
△ Less
Submitted 22 August, 2013;
originally announced August 2013.
-
Synthesizing Robust Systems with RATSY
Authors:
Roderick Bloem,
Hans-Jürgen Gamauf,
Georg Hofferek,
Bettina Könighofer,
Robert Könighofer
Abstract:
Specifications for reactive systems often consist of environment assumptions and system guarantees. An implementation should not only be correct, but also robust in the sense that it behaves reasonably even when the assumptions are (temporarily) violated. We present an extension of the requirements analysis and synthesis tool RATSY that is able to synthesize robust systems from GR(1) specification…
▽ More
Specifications for reactive systems often consist of environment assumptions and system guarantees. An implementation should not only be correct, but also robust in the sense that it behaves reasonably even when the assumptions are (temporarily) violated. We present an extension of the requirements analysis and synthesis tool RATSY that is able to synthesize robust systems from GR(1) specifications, i.e., system in which a finite number of safety assumption violations is guaranteed to induce only a finite number of safety guarantee violations. We show how the specification can be turned into a two-pair Streett game, and how a winning strategy corresponding to a correct and robust implementation can be computed. Finally, we provide some experimental results.
△ Less
Submitted 3 July, 2012;
originally announced July 2012.