-
Resolving Multiple-Dynamic Model Uncertainty in Hypothesis-Driven Belief-MDPs
Authors:
Ofer Dagan,
Tyler Becker,
Zachary N. Sunberg
Abstract:
When human operators of cyber-physical systems encounter surprising behavior, they often consider multiple hypotheses that might explain it. In some cases, taking information-gathering actions such as additional measurements or control inputs given to the system can help resolve uncertainty and determine the most accurate hypothesis. The task of optimizing these actions can be formulated as a beli…
▽ More
When human operators of cyber-physical systems encounter surprising behavior, they often consider multiple hypotheses that might explain it. In some cases, taking information-gathering actions such as additional measurements or control inputs given to the system can help resolve uncertainty and determine the most accurate hypothesis. The task of optimizing these actions can be formulated as a belief-space Markov decision process that we call a hypothesis-driven belief MDP. Unfortunately, this problem suffers from the curse of history similar to a partially observable Markov decision process (POMDP). To plan in continuous domains, an agent needs to reason over countlessly many possible action-observation histories, each resulting in a different belief over the unknown state. The problem is exacerbated in the hypothesis-driven context because each action-observation pair spawns a different belief for each hypothesis, leading to additional branching. This paper considers the case in which each hypothesis corresponds to a different dynamic model in an underlying POMDP. We present a new belief MDP formulation that: (i) enables reasoning over multiple hypotheses, (ii) balances the goals of determining the (most likely) correct hypothesis and performing well in the underlying POMDP, and (iii) can be solved with sparse tree search.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Rao-Blackwellized POMDP Planning
Authors:
Jiho Lee,
Nisar R. Ahmed,
Kyle H. Wray,
Zachary N. Sunberg
Abstract:
Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle d…
▽ More
Partially Observable Markov Decision Processes (POMDPs) provide a structured framework for decision-making under uncertainty, but their application requires efficient belief updates. Sequential Importance Resampling Particle Filters (SIRPF), also known as Bootstrap Particle Filters, are commonly used as belief updaters in large approximate POMDP solvers, but they face challenges such as particle deprivation and high computational costs as the system's state dimension grows. To address these issues, this study introduces Rao-Blackwellized POMDP (RB-POMDP) approximate solvers and outlines generic methods to apply Rao-Blackwellization in both belief updates and online planning. We compare the performance of SIRPF and Rao-Blackwellized Particle Filters (RBPF) in a simulated localization problem where an agent navigates toward a target in a GPS-denied environment using POMCPOW and RB-POMCPOW planners. Our results not only confirm that RBPFs maintain accurate belief approximations over time with fewer particles, but, more surprisingly, RBPFs combined with quadrature-based integration improve planning quality significantly compared to SIRPF-based planning under the same computational limits.
△ Less
Submitted 3 March, 2025; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives
Authors:
Qi Heng Ho,
Martin S. Feather,
Federico Rossi,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a co…
▽ More
Partially Observable Markov Decision Processes (POMDPs) are powerful models for sequential decision making under transition and observation uncertainties. This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem (MRPP), where the goal is to maximize the probability of reaching some target states. This is also a core problem in model checking with logical specifications and is naturally undiscounted (discount factor is one). Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP. Specifically, we focus on trial-based heuristic search value iteration techniques and present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space (informed search via value bounds) while addressing their drawbacks in handling loops for indefinite-horizon problems. The algorithm produces policies with two-sided bounds on optimal reachability probabilities. We prove convergence to an optimal policy from below under certain conditions. Experimental evaluations on a suite of benchmarks show that our algorithm outperforms existing methods in almost all cases in both probability guarantees and computation time.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Cieran: Designing Sequential Colormaps via In-Situ Active Preference Learning
Authors:
Matt-Heun Hong,
Zachary N. Sunberg,
Danielle Albers Szafir
Abstract:
Quality colormaps can help communicate important data patterns. However, finding an aesthetically pleasing colormap that looks "just right" for a given scenario requires significant design and technical expertise. We introduce Cieran, a tool that allows any data analyst to rapidly find quality colormaps while designing charts within Jupyter Notebooks. Our system employs an active preference learni…
▽ More
Quality colormaps can help communicate important data patterns. However, finding an aesthetically pleasing colormap that looks "just right" for a given scenario requires significant design and technical expertise. We introduce Cieran, a tool that allows any data analyst to rapidly find quality colormaps while designing charts within Jupyter Notebooks. Our system employs an active preference learning paradigm to rank expert-designed colormaps and create new ones from pairwise comparisons, allowing analysts who are novices in color design to tailor colormaps to their data context. We accomplish this by treating colormap design as a path planning problem through the CIELAB colorspace with a context-specific reward model. In an evaluation with twelve scientists, we found that Cieran effectively modeled user preferences to rank colormaps and leveraged this model to create new quality designs. Our work shows the potential of active preference learning for supporting efficient visualization design optimization.
△ Less
Submitted 29 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Recursively-Constrained Partially Observable Markov Decision Processes
Authors:
Qi Heng Ho,
Tyler Becker,
Benjamin Kraske,
Zakariya Laouar,
Martin S. Feather,
Federico Rossi,
Morteza Lahijanian,
Zachary N. Sunberg
Abstract:
Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit…
▽ More
Many sequential decision problems involve optimizing one objective function while imposing constraints on other objectives. Constrained Partially Observable Markov Decision Processes (C-POMDP) model this case with transition uncertainty and partial observability. In this work, we first show that C-POMDPs violate the optimal substructure property over successive decision steps and thus may exhibit behaviors that are undesirable for some (e.g., safety critical) applications. Additionally, online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. To address these drawbacks, we introduce the Recursively-Constrained POMDP (RC-POMDP), which imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm for RC-POMDPs. Evaluations on benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.
△ Less
Submitted 4 June, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Explanation through Reward Model Reconciliation using POMDP Tree Search
Authors:
Benjamin D. Kraske,
Anshu Saksena,
Anna L. Buczak,
Zachary N. Sunberg
Abstract:
As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (PO…
▽ More
As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (POMDP) planning and the implicit reward model assumed by a human user. Action discrepancies, differences in decisions made by an algorithm and user, are leveraged to estimate a user's objectives as expressed in weightings of a reward function.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Sampling-based Reactive Synthesis for Nondeterministic Hybrid Systems
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a wi…
▽ More
This paper introduces a sampling-based strategy synthesis algorithm for nondeterministic hybrid systems with complex continuous dynamics under temporal and reachability constraints. We model the evolution of the hybrid system as a two-player game, where the nondeterminism is an adversarial player whose objective is to prevent achieving temporal and reachability goals. The aim is to synthesize a winning strategy -- a reactive (robust) strategy that guarantees the satisfaction of the goals under all possible moves of the adversarial player. Our proposed approach involves growing a (search) game-tree in the hybrid space by combining sampling-based motion planning with a novel bandit-based technique to select and improve on partial strategies. We show that the algorithm is probabilistically complete, i.e., the algorithm will asymptotically almost surely find a winning strategy, if one exists. The case studies and benchmark results show that our algorithm is general and effective, and consistently outperforms state of the art algorithms.
△ Less
Submitted 23 December, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Planning with SiMBA: Motion Planning under Uncertainty for Temporal Goals using Simplified Belief Guides
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper presents a new multi-layered algorithm for motion planning under motion and sensing uncertainties for Linear Temporal Logic specifications. We propose a technique to guide a sampling-based search tree in the combined task and belief space using trajectories from a simplified model of the system, to make the problem computationally tractable. Our method eliminates the need to construct f…
▽ More
This paper presents a new multi-layered algorithm for motion planning under motion and sensing uncertainties for Linear Temporal Logic specifications. We propose a technique to guide a sampling-based search tree in the combined task and belief space using trajectories from a simplified model of the system, to make the problem computationally tractable. Our method eliminates the need to construct fine and accurate finite abstractions. We prove correctness and probabilistic completeness of our algorithm, and illustrate the benefits of our approach on several case studies. Our results show that guidance with a simplified belief space model allows for significant speed-up in planning for complex specifications.
△ Less
Submitted 9 April, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Optimality Guarantees for Particle Belief Approximation of POMDPs
Authors:
Michael H. Lim,
Tyler J. Becker,
Mykel J. Kochenderfer,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w…
▽ More
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
△ Less
Submitted 19 October, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Automaton-Guided Control Synthesis for Signal Temporal Logic Specifications
Authors:
Qi Heng Ho,
Roland B. Ilyes,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our ap…
▽ More
This paper presents an algorithmic framework for control synthesis of continuous dynamical systems subject to signal temporal logic (STL) specifications. We propose a novel algorithm to obtain a time-partitioned finite automaton from an STL specification, and introduce a multi-layered framework that utilizes this automaton to guide a sampling-based search tree both spatially and temporally. Our approach is able to synthesize a controller for nonlinear dynamics and polynomial predicate functions. We prove the correctness and probabilistic completeness of our algorithm, and illustrate the efficiency and efficacy of our framework on several case studies. Our results show an order of magnitude speedup over the state of the art.
△ Less
Submitted 4 October, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Gaussian Belief Trees for Chance Constrained Asymptotically Optimal Motion Planning
Authors:
Qi Heng Ho,
Zachary N. Sunberg,
Morteza Lahijanian
Abstract:
In this paper, we address the problem of sampling-based motion planning under motion and measurement uncertainty with probabilistic guarantees. We generalize traditional sampling-based tree-based motion planning algorithms for deterministic systems and propose belief-$\mathcal{A}$, a framework that extends any kinodynamical tree-based planner to the belief space for linear (or linearizable) system…
▽ More
In this paper, we address the problem of sampling-based motion planning under motion and measurement uncertainty with probabilistic guarantees. We generalize traditional sampling-based tree-based motion planning algorithms for deterministic systems and propose belief-$\mathcal{A}$, a framework that extends any kinodynamical tree-based planner to the belief space for linear (or linearizable) systems. We introduce appropriate sampling techniques and distance metrics for the belief space that preserve the probabilistic completeness and asymptotic optimality properties of the underlying planner. We demonstrate the efficacy of our approach for finding safe low-cost paths efficiently and asymptotically optimally in simulation, for both holonomic and non-holonomic systems.
△ Less
Submitted 4 October, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Compositional Learning-based Planning for Vision POMDPs
Authors:
Sampada Deglurkar,
Michael H. Lim,
Johnathan Tucker,
Zachary N. Sunberg,
Aleksandra Faust,
Claire J. Tomlin
Abstract:
The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi…
▽ More
The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time.
△ Less
Submitted 2 December, 2022; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs
Authors:
Michael H. Lim,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori…
▽ More
This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.
△ Less
Submitted 1 April, 2021; v1 submitted 18 December, 2020;
originally announced December 2020.
-
APF-PF: Probabilistic Depth Perception for 3D Reactive Obstacle Avoidance
Authors:
Shakeeb Ahmad,
Zachary N. Sunberg,
J. Sean Humbert
Abstract:
This paper proposes a framework for 3D obstacle avoidance in the presence of partial observability of environment obstacles. The method focuses on the utility of the Artificial Potential Function (APF) controller in a practical setting where noisy and incomplete information about the proximity is inevitable. We propose a Particle Filter (PF) approach to estimate potential obstacle locations in an…
▽ More
This paper proposes a framework for 3D obstacle avoidance in the presence of partial observability of environment obstacles. The method focuses on the utility of the Artificial Potential Function (APF) controller in a practical setting where noisy and incomplete information about the proximity is inevitable. We propose a Particle Filter (PF) approach to estimate potential obstacle locations in an input depth image stream. The probable candidates are then used to generate an action that maneuvers the robot towards the negative gradient of potential at each time instant. Rigorous experimental validation on a quadrotor UAV highlights the robustness and reliability of the method when robot's sensitivity to incorrect perception information can be concerning. The proposed perception and control stack is run onboard the UAV, demonstrating the computational feasibility for real-time applications and agile robots.
△ Less
Submitted 17 March, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
iLQGames.jl: Rapidly Designing and Solving Differential Games in Julia
Authors:
Lasse Peters,
Zachary N. Sunberg
Abstract:
In many problems that involve multiple decision making agents, optimal choices for each agent depend on the choices of others. Differential game theory provides a principled formalism for expressing these coupled interactions and recent work offers efficient approximations to solve these problems to non-cooperative equilibria. iLQGames.jl is a framework for designing and solving differential games…
▽ More
In many problems that involve multiple decision making agents, optimal choices for each agent depend on the choices of others. Differential game theory provides a principled formalism for expressing these coupled interactions and recent work offers efficient approximations to solve these problems to non-cooperative equilibria. iLQGames.jl is a framework for designing and solving differential games, built around the iterative linear-quadratic method. It is written in the Julia programming language to allow flexible prototyping and integration with other research software, while leveraging the high-performance nature of the language to allow real-time execution. The open-source software package can be found at https://github.com/lassepe/iLQGames.jl.
△ Less
Submitted 6 April, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Inference-Based Strategy Alignment for General-Sum Differential Games
Authors:
Lasse Peters,
David Fridovich-Keil,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often,…
▽ More
In many settings where multiple agents interact, the optimal choices for each agent depend heavily on the choices of the others. These coupled interactions are well-described by a general-sum differential game, in which players have differing objectives, the state evolves in continuous time, and optimal play may be characterized by one of many equilibrium concepts, e.g., a Nash equilibrium. Often, problems admit multiple equilibria. From the perspective of a single agent in such a game, this multiplicity of solutions can introduce uncertainty about how other agents will behave. This paper proposes a general framework for resolving ambiguity between equilibria by reasoning about the equilibrium other agents are aiming for. We demonstrate this framework in simulations of a multi-player human-robot navigation problem that yields two main conclusions: First, by inferring which equilibrium humans are operating at, the robot is able to predict trajectories more accurately, and second, by discovering and aligning itself to this equilibrium the robot is able to reduce the cost for all players.
△ Less
Submitted 6 May, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.
-
Sparse tree search optimality guarantees in POMDPs with continuous observation spaces
Authors:
Michael H. Lim,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However…
▽ More
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.
△ Less
Submitted 5 June, 2023; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Optimized and Trusted Collision Avoidance for Unmanned Aerial Vehicles using Approximate Dynamic Programming (Technical Report)
Authors:
Zachary N. Sunberg,
Mykel J. Kochenderfer,
Marco Pavone
Abstract:
Safely integrating unmanned aerial vehicles into civil airspace is contingent upon development of a trustworthy collision avoidance system. This paper proposes an approach whereby a parameterized resolution logic that is considered trusted for a given range of its parameters is adaptively tuned online. Specifically, to address the potential conservatism of the resolution logic with static paramete…
▽ More
Safely integrating unmanned aerial vehicles into civil airspace is contingent upon development of a trustworthy collision avoidance system. This paper proposes an approach whereby a parameterized resolution logic that is considered trusted for a given range of its parameters is adaptively tuned online. Specifically, to address the potential conservatism of the resolution logic with static parameters, we present a dynamic programming approach for adapting the parameters dynamically based on the encounter state. We compute the adaptation policy offline using a simulation-based approximate dynamic programming method that accommodates the high dimensionality of the problem. Numerical experiments show that this approach improves safety and operational performance compared to the baseline resolution logic, while retaining trustworthiness.
△ Less
Submitted 18 February, 2016; v1 submitted 15 February, 2016;
originally announced February 2016.