-
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives
Authors:
Qi Heng Ho,
Martin S. Feather,
Federico Rossi,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a co…
▽ More
Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Feasibility-Guided Safety-Aware Model Predictive Control for Jump Markov Linear Systems
Authors:
Zakariya Laouar,
Qi Heng Ho,
Rayan Mazouz,
Tyler Becker,
Zachary N. Sunberg
Abstract:
In this paper, we present a controller framework that synthesizes control policies for Jump Markov Linear Systems subject to stochastic mode switches and imperfect mode estimation. Our approach builds on safe and robust methods for Model Predictive Control (MPC), but in contrast to existing approaches that either optimize without regard to feasibility or utilize soft constraints that increase comp…
▽ More
In this paper, we present a controller framework that synthesizes control policies for Jump Markov Linear Systems subject to stochastic mode switches and imperfect mode estimation. Our approach builds on safe and robust methods for Model Predictive Control (MPC), but in contrast to existing approaches that either optimize without regard to feasibility or utilize soft constraints that increase computational requirements, we employ a safe and robust control approach informed by the feasibility of the optimization problem. We formulate and encode finite horizon safety for multiple model systems in our MPC design using Control Barrier Functions (CBFs). When subject to inaccurate hybrid state estimation, our feasibility-guided MPC generates a control policy that is maximally robust to uncertainty in the system's modes. We evaluate our approach on an orbital rendezvous problem and a six degree-of-freedom hexacopter under several scenarios and benchmarks to demonstrate the utility of the framework. Results indicate that the proposed technique of maximizing the robustness horizon, and the use of CBFs for safety awareness, improve the overall safety and performance of MPC for Jump Markov Linear Systems.
△ Less
Submitted 15 September, 2024; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Sampling-based Reactive Synthesis for Nondeterministic Hybrid Systems
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a wi…
▽ More
This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a winning strategy -- a reactive (robust) strategy that guarantees the satisfaction of the goals under all possible moves of the adversarial player. Our proposed approach involves growing a (search) game-tree in the hybrid space by combining sampling-based motion planning with a novel bandit-based technique to select and improve on partial strategies. We show that the algorithm is probabilistically complete, i.e., the algorithm will asymptotically almost surely find a winning strategy, if one exists. The case studies and benchmark results show that our algorithm is general and effective, and consistently outperforms state of the art algorithms.
△ Less
Submitted 23 December, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Planning with SiMBA: Motion Planning under Uncertainty for Temporal Goals using Simplified Belief Guides
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper presents a new multi-layered algorithm for motion planning under motion and sensing uncertainties for Linear Temporal Logic specifications. We propose a technique to guide a sampling-based search tree in the combined task and belief space using trajectories from a simplified model of the system, to make the problem computationally tractable. Our method eliminates the need to construct f…
▽ More
This paper presents a new multi-layered algorithm for motion planning under motion and sensing uncertainties for Linear Temporal Logic specifications. We propose a technique to guide a sampling-based search tree in the combined task and belief space using trajectories from a simplified model of the system, to make the problem computationally tractable. Our method eliminates the need to construct fine and accurate finite abstractions. We prove correctness and probabilistic completeness of our algorithm, and illustrate the benefits of our approach on several case studies. Our results show that guidance with a simplified belief space model allows for significant speed-up in planning for complex specifications.
△ Less
Submitted 9 April, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Optimality Guarantees for Particle Belief Approximation of POMDPs
Authors:
Michael H. Lim,
Tyler J. Becker,
Mykel J. Kochenderfer,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w…
▽ More
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
△ Less
Submitted 19 October, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Automaton-Guided Control Synthesis for Signal Temporal Logic Specifications
Authors:
Qi Heng Ho,
Roland B. Ilyes,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our ap…
▽ More
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our approach is able to synthesize a controller for nonlinear dynamics and polynomial predicate functions. We prove the correctness and probabilistic completeness of our algorithm, and illustrate the efficiency and efficacy of our framework on several case studies. Our results show an order of magnitude speedup over the state of the art.
△ Less
Submitted 4 October, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Gaussian Belief Trees for Chance Constrained Asymptotically Optimal Motion Planning
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
In this paper, we address the problem of sampling-based motion planning under motion and measurement uncertainty with probabilistic guarantees. We generalize traditional sampling-based tree-based motion planning algorithms for deterministic systems and propose belief-$\mathcal{A}$, a framework that extends any kinodynamical tree-based planner to the belief space for linear (or linearizable) system…
▽ More
In this paper, we address the problem of sampling-based motion planning under motion and measurement uncertainty with probabilistic guarantees. We generalize traditional sampling-based tree-based motion planning algorithms for deterministic systems and propose belief-$\mathcal{A}$, a framework that extends any kinodynamical tree-based planner to the belief space for linear (or linearizable) systems. We introduce appropriate sampling techniques and distance metrics for the belief space that preserve the probabilistic completeness and asymptotic optimality properties of the underlying planner. We demonstrate the efficacy of our approach for finding safe low-cost paths efficiently and asymptotically optimally in simulation, for both holonomic and non-holonomic systems.
△ Less
Submitted 4 October, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Compositional Learning-based Planning for Vision POMDPs
Authors:
Sampada Deglurkar,
Michael H. Lim,
Johnathan Tucker,
Zachary N. Sunberg,
Aleksandra Faust,
Claire J. Tomlin
Abstract:
The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi…
▽ More
The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time.
△ Less
Submitted 2 December, 2022; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs
Authors:
Michael H. Lim,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori…
▽ More
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.
△ Less
Submitted 1 April, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
APF-PF: Probabilistic Depth Perception for 3D Reactive Obstacle Avoidance
Authors:
Shakeeb Ahmad,
Zachary N. Sunberg,
J. Sean Humbert
Abstract:
This paper proposes a framework for 3D obstacle avoidance in the presence of partial observability of environment obstacles. The method focuses on the utility of the Artificial Potential Function (APF) controller in a practical setting where noisy and incomplete information about the proximity is inevitable. We propose a Particle Filter (PF) approach to estimate potential obstacle locations in an…
▽ More
This paper proposes a framework for 3D obstacle avoidance in the presence of partial observability of environment obstacles. The method focuses on the utility of the Artificial Potential Function (APF) controller in a practical setting where noisy and incomplete information about the proximity is inevitable. We propose a Particle Filter (PF) approach to estimate potential obstacle locations in an input depth image stream. The probable candidates are then used to generate an action that maneuvers the robot towards the negative gradient of potential at each time instant. Rigorous experimental validation on a quadrotor UAV highlights the robustness and reliability of the method when robot's sensitivity to incorrect perception information can be concerning. The proposed perception and control stack is run onboard the UAV, demonstrating the computational feasibility for real-time applications and agile robots.
△ Less
Submitted 17 March, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Inference-Based Strategy Alignment for General-Sum Differential Games
Authors:
Lasse Peters,
David Fridovich-Keil,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often,…
▽ More
In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players.
△ Less
Submitted 6 May, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.
-
Sparse tree search optimality guarantees in POMDPs with continuous observation spaces
Authors:
Michael H. Lim,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However…
▽ More
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.
△ Less
Submitted 5 June, 2023; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning
Authors:
Patrick Slade,
Zachary N. Sunberg,
Mykel J. Kochenderfer
Abstract:
Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploi…
▽ More
Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploitation. The specific problem setting considered here is for discrete-time nonlinear systems, with process noise, input-constraints, and parameter uncertainty. This article frames this problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search with an unscented Kalman filter to account for process noise and parameter uncertainty. This method is compared with certainty equivalent model predictive control and a tree search method that approximates the QMDP solution, providing insight into when information gathering is useful. Discrete time simulations characterize performance over a range of process noise and bounds on unknown parameters. An offline optimization method is used to select the Monte Carlo tree search parameters without hand-tuning. In lieu of recursive feasibility guarantees, a probabilistic bounding heuristic is offered that increases the probability of keeping the state within a desired region.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.