Search | arXiv e-print repository

doi 10.46298/theoretics.25.10

Learning Algorithms for Verification of Markov Decision Processes

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma

Abstract: We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}z… ▽ More We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP. △ Less

Submitted 31 March, 2025; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 82 pages. This is the TheoretiCS journal version

Journal ref: TheoretiCS, Volume 4 (April 1, 2025) theoretics:13268

arXiv:2310.01163 [pdf, other]

Trust-Aware Motion Planning for Human-Robot Collaboration under Distribution Temporal Logic Specifications

Authors: Pian Yu, Shuyang Dong, Shili Sheng, Lu Feng, Marta Kwiatkowska

Abstract: Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework… ▽ More Recent work has considered trust-aware decision making for human-robot collaboration (HRC) with a focus on model learning. In this paper, we are interested in enabling the HRC system to complete complex tasks specified using temporal logic that involve human trust. Since human trust in robots is not observable, we adopt the widely used partially observable Markov decision process (POMDP) framework for modelling the interactions between humans and robots. To specify the desired behaviour, we propose to use syntactically co-safe linear distribution temporal logic (scLDTL), a logic that is defined over predicates of states as well as belief states of partially observable systems. The incorporation of belief predicates in scLDTL enhances its expressiveness while simultaneously introducing added complexity. This also presents a new challenge as the belief predicates must be evaluated over the continuous (infinite) belief space. To address this challenge, we present an algorithm for solving the optimal policy synthesis problem. First, we enhance the belief MDP (derived by reformulating the POMDP) with a probabilistic labelling function. Then a product belief MDP is constructed between the probabilistically labelled belief MDP and the automaton translation of the scLDTL formula. Finally, we show that the optimal policy can be obtained by leveraging existing point-based value iteration algorithms with essential modifications. Human subject experiments with 21 participants on a driving simulator demonstrate the effectiveness of the proposed approach. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2306.17639 [pdf, other]

Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms

Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

Abstract: The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which… ▽ More The increasing trend to integrate neural networks and conventional software components in safety-critical settings calls for methodologies for their formal modelling, verification and correct-by-construction policy synthesis. We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs), a variant of continuous-state POMDPs with discrete observations and actions, in which the agent perceives a continuous-state environment using a neural {\revise perception mechanism} and makes decisions symbolically. The perception mechanism classifies inputs such as images and sensor values into symbolic percepts, which are used in decision making. We study the problem of optimising discounted cumulative rewards for NS-POMDPs. Working directly with the continuous state space, we exploit the underlying structure of the model and the neural perception mechanism to propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors, and extend Bellman backups to this representation. We prove the convexity and continuity of value functions and present two value iteration algorithms that ensure finite representability. The first is a classical (exact) value iteration algorithm extending the $α$-functions of Porta {\em et al} (2006) to the P-PWLC representation for continuous-state spaces. The second is a point-based (approximate) method called NS-HSVI, which uses the P-PWLC representation and belief-value induced functions to approximate value functions from below and above for two types of beliefs, particle-based and region-based. Using a prototype implementation, we show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions, by synthesising (approximately) optimal strategies. △ Less

Submitted 7 August, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: 65 pages, 14 figures

arXiv:1907.00098 [pdf, other]

Robustness Guarantees for Deep Neural Networks on Videos

Authors: Min Wu, Marta Kwiatkowska

Abstract: The widespread adoption of deep learning models places demands on their robustness. In this paper, we consider the robustness of deep neural networks on videos, which comprise both the spatial features of individual frames extracted by a convolutional neural network and the temporal dynamics between adjacent frames captured by a recurrent neural network. To measure robustness, we study the maximum… ▽ More The widespread adoption of deep learning models places demands on their robustness. In this paper, we consider the robustness of deep neural networks on videos, which comprise both the spatial features of individual frames extracted by a convolutional neural network and the temporal dynamics between adjacent frames captured by a recurrent neural network. To measure robustness, we study the maximum safe radius problem, which computes the minimum distance from the optical flow sequence obtained from a given input to that of an adversarial example in the neighbourhood of the input. We demonstrate that, under the assumption of Lipschitz continuity, the problem can be approximated using finite optimisation via discretising the optical flow space, and the approximation has provable guarantees. We then show that the finite optimisation problem can be solved by utilising a two-player turn-based game in a cooperative setting, where the first player selects the optical flows and the second player determines the dimensions to be manipulated in the chosen flow. We employ an anytime approach to solve the game, in the sense of approximating the value of the game by monotonically improving its upper and lower bounds. We exploit a gradient-based search algorithm to compute the upper bounds, and the admissible A* algorithm to update the lower bounds. Finally, we evaluate our framework on the UCF101 video dataset. △ Less

Submitted 3 April, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

Comments: To appear in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:1903.10390 [pdf, other]

PID Control of Biochemical Reaction Networks

Authors: Max Whitby, Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Mirco Tribastone, Max Tschaikowski

Abstract: Principles of feedback control have been shown to naturally arise in biological systems and successfully applied to build synthetic circuits. In this work we consider Biochemical Reaction Networks (CRNs) as a paradigm for modelling biochemical systems and provide the first implementation of a derivative component in CRNs. That is, given an input signal represented by the concentration level of som… ▽ More Principles of feedback control have been shown to naturally arise in biological systems and successfully applied to build synthetic circuits. In this work we consider Biochemical Reaction Networks (CRNs) as a paradigm for modelling biochemical systems and provide the first implementation of a derivative component in CRNs. That is, given an input signal represented by the concentration level of some species, we build a CRN that produces as output the concentration of two species whose difference is the derivative of the input signal. By relying on this component, we present a CRN implementation of a feedback control loop with Proportional-Integral-Derivative (PID) controller and apply the resulting control architecture to regulate the protein expression in a microRNA regulated gene expression model. △ Less

Submitted 25 March, 2019; originally announced March 2019.

Comments: 8 Pages, 4 figures, Submitted to CDC 2019

arXiv:1901.01576 [pdf, other]

Efficiency through Uncertainty: Scalable Formal Synthesis for Stochastic Hybrid Systems

Authors: Nathalie Cauchi, Luca Laurenti, Morteza Lahijanian, Alessandro Abate, Marta Kwiatkowska, Luca Cardelli

Abstract: This work targets the development of an efficient abstraction method for formal analysis and control synthesis of discrete-time stochastic hybrid systems (SHS) with linear dynamics. The focus is on temporal logic specifications, both over finite and infinite time horizons. The framework constructs a finite abstraction as a class of uncertain Markov models known as interval Markov decision process… ▽ More This work targets the development of an efficient abstraction method for formal analysis and control synthesis of discrete-time stochastic hybrid systems (SHS) with linear dynamics. The focus is on temporal logic specifications, both over finite and infinite time horizons. The framework constructs a finite abstraction as a class of uncertain Markov models known as interval Markov decision process (IMDP). Then, a strategy that maximizes the satisfaction probability of the given specification is synthesized over the IMDP and mapped to the underlying SHS. In contrast to existing formal approaches, which are by and large limited to finite-time properties and rely on conservative over-approximations, we show that the exact abstraction error can be computed as a solution of convex optimization problems and can be embedded into the IMDP abstraction. This is later used in the synthesis step over both finite- and infinite-horizon specifications, mitigating the known state-space explosion problem. Our experimental validation of the new approach compared to existing abstraction-based approaches shows: (i) significant (orders of magnitude) reduction of the abstraction error; (ii) marked speed-ups; and (iii) boosted scalability, allowing in particular to verify models with more than 10 continuous variables. △ Less

Submitted 6 January, 2019; originally announced January 2019.

arXiv:1504.04662 [pdf, other]

doi 10.2168/LMCS-11(2:16)2015

Permissive Controller Synthesis for Probabilistic Systems

Authors: Klaus Drager, Vojtech Forejt, Marta Kwiatkowska, David Parker, Mateusz Ujma

Abstract: We propose novel controller synthesis techniques for probabilistic systems modelled using stochastic two-player games: one player acts as a controller, the second represents its environment, and probability is used to capture uncertainty arising due to, for example, unreliable sensors or faulty system components. Our aim is to generate robust controllers that are resilient to unexpected system ch… ▽ More We propose novel controller synthesis techniques for probabilistic systems modelled using stochastic two-player games: one player acts as a controller, the second represents its environment, and probability is used to capture uncertainty arising due to, for example, unreliable sensors or faulty system components. Our aim is to generate robust controllers that are resilient to unexpected system changes at runtime, and flexible enough to be adapted if additional constraints need to be imposed. We develop a permissive controller synthesis framework, which generates multi-strategies for the controller, offering a choice of control actions to take at each time step. We formalise the notion of permissivity using penalties, which are incurred each time a possible control action is disallowed by a multi-strategy. Permissive controller synthesis aims to generate a multi-strategy that minimises these penalties, whilst guaranteeing the satisfaction of a specified system property. We establish several key results about the optimality of multi-strategies and the complexity of synthesising them. Then, we develop methods to perform permissive controller synthesis using mixed integer linear programming and illustrate their effectiveness on a selection of case studies. △ Less

Submitted 29 June, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

Journal ref: Logical Methods in Computer Science, Volume 11, Issue 2 (June 30, 2015) lmcs:1576

arXiv:1206.4504 [pdf, other]

Revisiting Timed Specification Theories: A Linear-Time Perspective

Authors: Chris Chilton, Marta Kwiatkowska, Xu Wang

Abstract: We consider the setting of component-based design for real-time systems with critical timing constraints. Based on our earlier work, we propose a compositional specification theory for timed automata with I/O distinction, which supports substitutive refinement. Our theory provides the operations of parallel composition for composing components at run-time, logical conjunction/disjunction for indep… ▽ More We consider the setting of component-based design for real-time systems with critical timing constraints. Based on our earlier work, we propose a compositional specification theory for timed automata with I/O distinction, which supports substitutive refinement. Our theory provides the operations of parallel composition for composing components at run-time, logical conjunction/disjunction for independent development, and quotient for incremental synthesis. The key novelty of our timed theory lies in a weakest congruence preserving safety as well as bounded liveness properties. We show that the congruence can be characterised by two linear-time semantics, timed-traces and timed-strategies, the latter of which is derived from a game-based interpretation of timed interaction. △ Less

Submitted 19 June, 2012; originally announced June 2012.

Showing 1–8 of 8 results for author: Kwiatkowska, M