-
Learning Algorithms for Verification of Markov Decision Processes
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Martin Chmelik,
Vojtěch Forejt,
Jan Křetínský,
Marta Kwiatkowska,
Tobias Meggendorfer,
David Parker,
Mateusz Ujma
Abstract:
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}z…
▽ More
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors.
The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
△ Less
Submitted 31 March, 2025; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Policy learning for time-bounded reachability in Continuous-Time Markov Decision Processes via doubly-stochastic gradient ascent
Authors:
Ezio Bartocci,
Luca Bortolussi,
Tomǎš Brázdil,
Dimitrios Milios,
Guido Sanguinetti
Abstract:
Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an u…
▽ More
Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task.
△ Less
Submitted 31 May, 2016;
originally announced May 2016.
-
Minimizing Running Costs in Consumption Systems
Authors:
Tomáš Brázdil,
David Klaška,
Antonín Kučera,
Petr Novotný
Abstract:
A standard approach to optimizing long-run running costs of discrete systems is based on minimizing the mean-payoff, i.e., the long-run average amount of resources ("energy") consumed per transition. However, this approach inherently assumes that the energy source has an unbounded capacity, which is not always realistic. For example, an autonomous robotic device has a battery of finite capacity th…
▽ More
A standard approach to optimizing long-run running costs of discrete systems is based on minimizing the mean-payoff, i.e., the long-run average amount of resources ("energy") consumed per transition. However, this approach inherently assumes that the energy source has an unbounded capacity, which is not always realistic. For example, an autonomous robotic device has a battery of finite capacity that has to be recharged periodically, and the total amount of energy consumed between two successive charging cycles is bounded by the capacity. Hence, a controller minimizing the mean-payoff must obey this restriction. In this paper we study the controller synthesis problem for consumption systems with a finite battery capacity, where the task of the controller is to minimize the mean-payoff while preserving the functionality of the system encoded by a given linear-time property. We show that an optimal controller always exists, and it may either need only finite memory or require infinite memory (it is decidable in polynomial time which of the two cases holds). Further, we show how to compute an effective description of an optimal controller in polynomial time. Finally, we consider the limit values achievable by larger and larger battery capacity, show that these values are computable in polynomial time, and we also analyze the corresponding rate of convergence. To the best of our knowledge, these are the first results about optimizing the long-run running costs in systems with bounded energy stores.
△ Less
Submitted 24 March, 2014; v1 submitted 20 February, 2014;
originally announced February 2014.
-
Trading Performance for Stability in Markov Decision Processes
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Vojtěch Forejt,
Antonín Kučera
Abstract:
We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability.
We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient,…
▽ More
We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability.
We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively.
We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases.
Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.
△ Less
Submitted 23 April, 2013;
originally announced May 2013.
-
Efficient Controller Synthesis for Consumption Games with Multiple Resource Types
Authors:
Tomáš Brázdil,
Krishnendu Chatterjee,
Antonín Kučera,
Petr Novotný
Abstract:
We introduce consumption games, a model for discrete interactive system with multiple resources that are consumed or reloaded independently. More precisely, a consumption game is a finite-state graph where each transition is labeled by a vector of resource updates, where every update is a non-positive number or omega. The omega updates model the reloading of a given resource. Each vertex belongs e…
▽ More
We introduce consumption games, a model for discrete interactive system with multiple resources that are consumed or reloaded independently. More precisely, a consumption game is a finite-state graph where each transition is labeled by a vector of resource updates, where every update is a non-positive number or omega. The omega updates model the reloading of a given resource. Each vertex belongs either to player \Box or player \Diamond, where the aim of player \Box is to play so that the resources are never exhausted. We consider several natural algorithmic problems about consumption games, and show that although these problems are computationally hard in general, they are solvable in polynomial time for every fixed number of resource types (i.e., the dimension of the update vectors).
△ Less
Submitted 31 May, 2012; v1 submitted 3 February, 2012;
originally announced February 2012.
-
Fixed-delay Events in Generalized Semi-Markov Processes Revisited
Authors:
Tomáš Brázdil,
Jan Krčál,
Jan Křetínský,
Vojtěch Řehák
Abstract:
We study long run average behavior of generalized semi-Markov processes with both fixed-delay events as well as variable-delay events. We show that allowing two fixed-delay events and one variable-delay event may cause an unstable behavior of a GSMP. In particular, we show that a frequency of a given state may not be defined for almost all runs (or more generally, an invariant measure may not exis…
▽ More
We study long run average behavior of generalized semi-Markov processes with both fixed-delay events as well as variable-delay events. We show that allowing two fixed-delay events and one variable-delay event may cause an unstable behavior of a GSMP. In particular, we show that a frequency of a given state may not be defined for almost all runs (or more generally, an invariant measure may not exist). We use this observation to disprove several results from literature. Next we study GSMP with at most one fixed-delay event combined with an arbitrary number of variable-delay events. We prove that such a GSMP always possesses an invariant measure which means that the frequencies of states are always well defined and we provide algorithms for approximation of these frequencies. Additionally, we show that the positive results remain valid even if we allow an arbitrary number of reasonably restricted fixed-delay events.
△ Less
Submitted 12 September, 2011; v1 submitted 7 June, 2011;
originally announced June 2011.
-
Measuring Performance of Continuous-Time Stochastic Processes using Timed Automata
Authors:
Tomáš Brázdil,
Jan Krčál,
Jan Křetínský,
Antonín Kučera,
Vojtěch Řehák
Abstract:
We propose deterministic timed automata (DTA) as a model-independent language for specifying performance and dependability measures over continuous-time stochastic processes. Technically, these measures are defined as limit frequencies of locations (control states) of a DTA that observes computations of a given stochastic process. Then, we study the properties of DTA measures over semi-Markov proc…
▽ More
We propose deterministic timed automata (DTA) as a model-independent language for specifying performance and dependability measures over continuous-time stochastic processes. Technically, these measures are defined as limit frequencies of locations (control states) of a DTA that observes computations of a given stochastic process. Then, we study the properties of DTA measures over semi-Markov processes in greater detail. We show that DTA measures over semi-Markov processes are well-defined with probability one, and there are only finitely many values that can be assumed by these measures with positive probability. We also give an algorithm which approximates these values and the associated probabilities up to an arbitrarily small given precision. Thus, we obtain a general and effective framework for analysing DTA measures over semi-Markov processes.
△ Less
Submitted 21 January, 2011;
originally announced January 2011.