-
Model Identification Adaptive Control with $ρ$-POMDP Planning
Authors:
Michelle Ho,
Arec Jamgochian,
Mykel J. Kochenderfer
Abstract:
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($ρ$-P…
▽ More
Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($ρ$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.
△ Less
Submitted 22 May, 2025; v1 submitted 14 May, 2025;
originally announced May 2025.
-
Verifying Nonlinear Neural Feedback Systems using Polyhedral Enclosures
Authors:
Samuel I. Akinwande,
Chelsea Sidrane,
Mykel J. Kochenderfer,
Clark Barrett
Abstract:
As dynamical systems equipped with neural network controllers (neural feedback systems) become increasingly prevalent, it is critical to develop methods to ensure their safe operation. Verifying safety requires extending control theoretic analysis methods to these systems. Although existing techniques can efficiently handle linear neural feedback systems, relatively few scalable methods address th…
▽ More
As dynamical systems equipped with neural network controllers (neural feedback systems) become increasingly prevalent, it is critical to develop methods to ensure their safe operation. Verifying safety requires extending control theoretic analysis methods to these systems. Although existing techniques can efficiently handle linear neural feedback systems, relatively few scalable methods address the nonlinear case. We propose a novel algorithm for forward reachability analysis of nonlinear neural feedback systems. The approach leverages the structure of the nonlinear transition functions of the systems to compute tight polyhedral enclosures (i.e., abstractions). These enclosures, combined with the neural controller, are then encoded as a mixed-integer linear program (MILP). Optimizing this MILP yields a sound over-approximation of the forward-reachable set. We evaluate our algorithm on representative benchmarks and demonstrate an order of magnitude improvement over the current state of the art.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models
Authors:
Alexandros E. Tzikas,
Mykel J. Kochenderfer
Abstract:
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order…
▽ More
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order information from the model with respect to the parameters. Our algorithm consists of two modules. First, we formulate the problem of system identification from a Bayesian perspective and use a linear Gaussian model approximation to iteratively optimize the model's parameters. In each iteration, we propose to use the input-output data to tune the covariance of the linear Gaussian model. This statistically calibrates the approach. Secondly, we define a Gaussian-based uncertainty measure for the model parameters, which we can then minimize with respect to the next selected input. We test our method with linear and nonlinear dynamics.
△ Less
Submitted 30 March, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Discrete-Time Distribution Steering using Monte Carlo Tree Search
Authors:
Alexandros E. Tzikas,
Liam A. Kruse,
Mansur Arief,
Mykel J. Kochenderfer,
Stephen Boyd
Abstract:
Optimal control problems with state distribution constraints have attracted interest for their expressivity, but solutions rely on linear approximations. We approach the problem of driving the state of a dynamical system in distribution from a sequential decision-making perspective. We formulate the optimal control problem as an appropriate Markov decision process (MDP), where the actions correspo…
▽ More
Optimal control problems with state distribution constraints have attracted interest for their expressivity, but solutions rely on linear approximations. We approach the problem of driving the state of a dynamical system in distribution from a sequential decision-making perspective. We formulate the optimal control problem as an appropriate Markov decision process (MDP), where the actions correspond to the state-feedback control policies. We then solve the MDP using Monte Carlo tree search (MCTS). This renders our method suitable for any dynamics model. A key component of our approach is a novel, easy to compute, distance metric in the distribution space that allows our algorithm to guide the distribution of the state. We experimentally test our algorithm under both linear and nonlinear dynamics.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Optimal Control of Mechanical Ventilators with Learned Respiratory Dynamics
Authors:
Isaac Ronald Ward,
Dylan M. Asmar,
Mansur Arief,
Jana Krystofova Mike,
Mykel J. Kochenderfer
Abstract:
Deciding on appropriate mechanical ventilator management strategies significantly impacts the health outcomes for patients with respiratory diseases. Acute Respiratory Distress Syndrome (ARDS) is one such disease that requires careful ventilator operation to be effectively treated. In this work, we frame the management of ventilators for patients with ARDS as a sequential decision making problem u…
▽ More
Deciding on appropriate mechanical ventilator management strategies significantly impacts the health outcomes for patients with respiratory diseases. Acute Respiratory Distress Syndrome (ARDS) is one such disease that requires careful ventilator operation to be effectively treated. In this work, we frame the management of ventilators for patients with ARDS as a sequential decision making problem using the Markov decision process framework. We implement and compare controllers based on clinical guidelines contained in the ARDSnet protocol, optimal control theory, and learned latent dynamics represented as neural networks. The Pulse Physiology Engine's respiratory dynamics simulator is used to establish a repeatable benchmark, gather simulated data, and quantitatively compare these controllers. We score performance in terms of measured improvement in established ARDS health markers (pertaining to improved respiratory rate, oxygenation, and vital signs). Our results demonstrate that techniques leveraging neural networks and optimal control can automatically discover effective ventilation management strategies without access to explicit ventilator management procedures or guidelines (such as those defined in the ARDSnet protocol).
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Optimal Ground Station Selection for Low-Earth Orbiting Satellites
Authors:
Duncan Eddy,
Michelle Ho,
Mykel J. Kochenderfer
Abstract:
This paper presents a solution to the problem of optimal ground station selection for low-Earth orbiting (LEO) space missions that enables mission operators to precisely design their ground segment performance and costs. Space mission operators are increasingly turning to Ground-Station-as-a-Service (GSaaS) providers to supply the terrestrial communications segment to reduce costs and increase net…
▽ More
This paper presents a solution to the problem of optimal ground station selection for low-Earth orbiting (LEO) space missions that enables mission operators to precisely design their ground segment performance and costs. Space mission operators are increasingly turning to Ground-Station-as-a-Service (GSaaS) providers to supply the terrestrial communications segment to reduce costs and increase network size. However, this approach leads to a new challenge of selecting the optimal service providers and station locations for a given mission. We consider the problem of ground station selection as an optimization problem and present a general solution framework that allows mission designers to set their overall optimization objective and constrain key mission performance variables such as total data downlink, total mission cost, recurring operational cost, and maximum communications time-gap. We solve the problem using integer programming (IP). To address computational scaling challenges, we introduce a surrogate optimization approach where the optimal station selection is determined based on solving the problem over a reduced time domain. Two different IP formulations are evaluated using randomized selections of LEO satellites of varying constellation sizes. We consider the networks of the commercial GSaaS providers Atlas Space Operations, Amazon Web Services (AWS) Ground Station, Azure Orbital Ground Station, Kongsberg Satellite Services (KSAT), Leaf Space, and Viasat Real-Time Earth. We compare our results against standard operational practices of integrating with one or two primary ground station providers.
△ Less
Submitted 1 March, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Informative Input Design for Dynamic Mode Decomposition
Authors:
Joshua Ott,
Mykel J. Kochenderfer,
Stephen Boyd
Abstract:
Efficiently estimating system dynamics from data is essential for minimizing data collection costs and improving model performance. This work addresses the challenge of designing future control inputs to maximize information gain, thereby improving the efficiency of the system identification process. We propose an approach that integrates informative input design into the Dynamic Mode Decompositio…
▽ More
Efficiently estimating system dynamics from data is essential for minimizing data collection costs and improving model performance. This work addresses the challenge of designing future control inputs to maximize information gain, thereby improving the efficiency of the system identification process. We propose an approach that integrates informative input design into the Dynamic Mode Decomposition with control (DMDc) framework, which is well-suited for high-dimensional systems. By formulating an approximate convex optimization problem that minimizes the trace of the estimation error covariance matrix, we are able to efficiently reduce uncertainty in the model parameters while respecting constraints on the system states and control inputs. This method outperforms traditional techniques like Pseudo-Random Binary Sequences (PRBS) and orthogonal multisines, which do not adapt to the current system model and often gather redundant information. We validate our approach using aircraft and fluid dynamics simulations to demonstrate the practical applicability and effectiveness of our method. Our results show that strategically planning control inputs based on the current model enhances the accuracy of system identification while requiring less data. Furthermore, we provide our implementation and simulation interfaces as an open-source software package, facilitating further research development and use by industry practitioners.
△ Less
Submitted 28 April, 2025; v1 submitted 19 September, 2024;
originally announced September 2024.
-
Optimizing Falsification for Learning-Based Control Systems: A Multi-Fidelity Bayesian Approach
Authors:
Zahra Shahrooei,
Mykel J. Kochenderfer,
Ali Baheri
Abstract:
Testing controllers in safety-critical systems is vital for ensuring their safety and preventing failures. In this paper, we address the falsification problem within learning-based closed-loop control systems through simulation. This problem involves the identification of counterexamples that violate system safety requirements and can be formulated as an optimization task based on these requiremen…
▽ More
Testing controllers in safety-critical systems is vital for ensuring their safety and preventing failures. In this paper, we address the falsification problem within learning-based closed-loop control systems through simulation. This problem involves the identification of counterexamples that violate system safety requirements and can be formulated as an optimization task based on these requirements. Using full-fidelity simulator data in this optimization problem can be computationally expensive. To improve efficiency, we propose a multi-fidelity Bayesian optimization falsification framework that harnesses simulators with varying levels of accuracy. Our proposed framework can transition between different simulators and establish meaningful relationships between them. Through multi-fidelity Bayesian optimization, we determine both the optimal system input likely to be a counterexample and the appropriate fidelity level for assessment. We evaluated our approach across various Gym environments, each featuring different levels of fidelity. Our experiments demonstrate that multi-fidelity Bayesian optimization is more computationally efficient than full-fidelity Bayesian optimization and other baseline methods in detecting counterexamples. A Python implementation of the algorithm is available at https://github.com/SAILRIT/MFBO_Falsification.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Watercraft as Overwater Ambulance Exchange Points to Enhance Aeromedical Evacuation
Authors:
Mahdi Al-Husseini,
Kyle H. Wray,
Mykel J. Kochenderfer
Abstract:
Ambulance exchange points are preidentified sites where patients are transferred between evacuation platforms while en route to enhanced medical care. We propose a new capability for maritime medical evacuation, which involves co-opting underway watercraft as overwater ambulance exchange points to transfer patients between medical evacuation aircraft. We partner with the United States Army's 25th…
▽ More
Ambulance exchange points are preidentified sites where patients are transferred between evacuation platforms while en route to enhanced medical care. We propose a new capability for maritime medical evacuation, which involves co-opting underway watercraft as overwater ambulance exchange points to transfer patients between medical evacuation aircraft. We partner with the United States Army's 25th Combat Aviation Brigade to demonstrate the use of an Army watercraft as an overwater ambulance exchange point. A manikin is transferred between two HH-60 Medical Evacuation Black Hawk helicopters conducting hoist operations over Army Logistics Support Vessel 3, which is traveling south of Honolulu, Hawaii. The demonstration is enabled by a decision support system for dispatching aircraft, hoist stabilization technology, commercial satellite internet, military geospatial infrastructure applications, and digital medical documentation tools, the benefits of which are all discussed. Three extensions of the overwater ambulance exchange point are introduced and civilian applications are considered.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Diffusion-Based Failure Sampling for Evaluating Safety-Critical Autonomous Systems
Authors:
Harrison Delecki,
Marc R. Schlichting,
Mansur Arief,
Anthony Corso,
Marcell Vazquez-Chanlatte,
Mykel J. Kochenderfer
Abstract:
Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample…
▽ More
Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques.
△ Less
Submitted 20 May, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
The Synergy Between Optimal Transport Theory and Multi-Agent Reinforcement Learning
Authors:
Ali Baheri,
Mykel J. Kochenderfer
Abstract:
This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent str…
▽ More
This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent strategies towards unified goals; (2) distributed resource management, employing OT to optimize resource allocation among agents; (3) addressing non-stationarity, using OT to adapt to dynamic environmental shifts; (4) scalable multi-agent learning, harnessing OT for decomposing large-scale learning objectives into manageable tasks; and (5) enhancing energy efficiency, applying OT principles to develop sustainable MARL systems. This paper articulates how the synergy between OT and MARL can address scalability issues, optimize resource distribution, align agent policies in cooperative environments, and ensure adaptability in dynamically changing conditions.
△ Less
Submitted 24 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning
Authors:
Marc R. Schlichting,
Nina V. Boord,
Anthony L. Corso,
Mykel J. Kochenderfer
Abstract:
Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian appr…
▽ More
Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator.
△ Less
Submitted 30 September, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis
Authors:
Ali Baheri,
Mykel J. Kochenderfer
Abstract:
Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting th…
▽ More
Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting their scalability for exhaustive testing. Conversely, low-fidelity simulators offer efficiency but may not capture the intricacies of high-fidelity simulators, potentially yielding false conclusions. We propose a joint falsification and fidelity optimization framework for safety validation of autonomous systems. Our mathematical formulation combines counterexample searches with simulator fidelity improvement, facilitating more efficient exploration of the critical environmental configurations challenging the control system. Our contributions encompass a set of theorems addressing counterexample sensitivity analysis, sample complexity, convergence, the interplay between the outer and inner optimization loops, and regret bound analysis. The proposed joint optimization approach enables a more targeted and efficient testing process, optimizes the use of available computational resources, and enhances confidence in autonomous system safety validation.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Optimizing Carbon Storage Operations for Long-Term Safety
Authors:
Yizheng Wang,
Markus Zechner,
Gege Wen,
Anthony Louis Corso,
John Michael Mern,
Mykel J. Kochenderfer,
Jef Karel Caers
Abstract:
To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov…
▽ More
To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov decision process (POMDP). We solve the POMDP using belief state planning to optimize injector and monitoring well locations, with the goal of maximizing stored CO2 while maintaining safety. Empirical results in simulation demonstrate that our approach is effective in ensuring safe long-term carbon storage operations. We showcase the flexibility of our approach by introducing three different monitoring strategies and examining their impact on decision quality. Additionally, we introduce a neural network surrogate model for the POMDP decision-making process to handle the complex dynamics of the multi-phase flow. We also investigate the effects of different fidelity levels of the surrogate model on decision qualities.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Falsification of Learning-Based Controllers through Multi-Fidelity Bayesian Optimization
Authors:
Zahra Shahrooei,
Mykel J. Kochenderfer,
Ali Baheri
Abstract:
Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety speci…
▽ More
Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety specification as an optimization problem. We propose a multi-fidelity falsification framework using Bayesian optimization, which is able to determine at which level of fidelity we should conduct a safety evaluation in addition to finding possible instances from the environment that cause the system to fail. This method allows us to automatically switch between inexpensive, inaccurate information from a low-fidelity simulator and expensive, accurate information from a high-fidelity simulator in a cost-effective way. Our experiments on various environments in simulation demonstrate that multi-fidelity Bayesian optimization has falsification performance comparable to single-fidelity Bayesian optimization but with much lower cost.
△ Less
Submitted 28 April, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Optimality Guarantees for Particle Belief Approximation of POMDPs
Authors:
Michael H. Lim,
Tyler J. Becker,
Mykel J. Kochenderfer,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w…
▽ More
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
△ Less
Submitted 19 October, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Backward Reachability Analysis of Neural Feedback Loops: Techniques for Linear and Nonlinear Systems
Authors:
Nicholas Rober,
Sydney M. Katz,
Chelsea Sidrane,
Esen Yel,
Michael Everett,
Mykel J. Kochenderfer,
Jonathan P. How
Abstract:
As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have…
▽ More
As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have been developed for systems without NN components, the nonlinearities in NN activation functions and general noninvertibility of NN weight matrices make backward reachability for NFLs a challenging problem. To avoid the difficulties associated with propagating sets backward through NNs, we introduce a framework that leverages standard forward NN analysis tools to efficiently find over-approximations to backprojection (BP) sets, i.e., sets of states for which an NN policy will lead a system to a given target set. We present frameworks for calculating BP over approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system.
△ Less
Submitted 21 November, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Collision Risk and Operational Impact of Speed Change Advisories as Aircraft Collision Avoidance Maneuvers
Authors:
Sydney M. Katz,
Luis E. Alvarez,
Michael Owen,
Samuel Wu,
Marc Brittain,
Anshuman Das,
Mykel J. Kochenderfer
Abstract:
Aircraft collision avoidance systems have long been a key factor in keeping our airspace safe. Over the past decade, the FAA has supported the development of a new family of collision avoidance systems called the Airborne Collision Avoidance System X (ACAS X), which model the collision avoidance problem as a Markov decision process (MDP). Variants of ACAS X have been created for both manned (ACAS…
▽ More
Aircraft collision avoidance systems have long been a key factor in keeping our airspace safe. Over the past decade, the FAA has supported the development of a new family of collision avoidance systems called the Airborne Collision Avoidance System X (ACAS X), which model the collision avoidance problem as a Markov decision process (MDP). Variants of ACAS X have been created for both manned (ACAS Xa) and unmanned aircraft (ACAS Xu and ACAS sXu). The variants primarily differ in the types of collision avoidance maneuvers they issue. For example, ACAS Xa issues vertical collision avoidance advisories, while ACAS Xu and ACAS sXu allow for horizontal advisories due to reduced aircraft performance capabilities. Currently, a new variant of ACAS X, called ACAS Xr, is being developed to provide collision avoidance capability to rotorcraft and Advanced Air Mobility (AAM) vehicles. Due to the desire to minimize deviation from the prescribed flight path of these aircraft, speed adjustments have been proposed as a potential collision avoidance maneuver for aircraft using ACAS Xr. In this work, we investigate the effect of speed change advisories on the safety and operational efficiency of collision avoidance systems. We develop an MDP-based collision avoidance logic that issues speed advisories and compare its performance to that of horizontal and vertical logics through Monte Carlo simulation on existing airspace encounter models. Our results show that while speed advisories are able to reduce collision risk, they are neither as safe nor as efficient as their horizontal and vertical counterparts.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Model Predictive Optimized Path Integral Strategies
Authors:
Dylan M. Asmar,
Ransalu Senanayake,
Shawn Manuel,
Mykel J. Kochenderfer
Abstract:
We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost…
▽ More
We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost functions. The benefit of optimizing the proposal distribution by integrating AIS at each control step is demonstrated in simulated environments including controlling multiple cars around a track. The new algorithm is more sample efficient than MPPI, achieving better performance with fewer samples. This performance disparity grows as the dimension of the action space increases. Results from simulations suggest the new algorithm can be used as an anytime algorithm, increasing the value of control at each iteration versus relying on a large set of samples.
△ Less
Submitted 1 March, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Dyadic Sex Composition and Task Classification Using fNIRS Hyperscanning Data
Authors:
Liam A. Kruse,
Allan L. Reiss,
Mykel J. Kochenderfer,
Stephanie Balters
Abstract:
Hyperscanning with functional near-infrared spectroscopy (fNIRS) is an emerging neuroimaging application that measures the nuanced neural signatures underlying social interactions. Researchers have assessed the effect of sex and task type (e.g., cooperation versus competition) on inter-brain coherence during human-to-human interactions. However, no work has yet used deep learning-based approaches…
▽ More
Hyperscanning with functional near-infrared spectroscopy (fNIRS) is an emerging neuroimaging application that measures the nuanced neural signatures underlying social interactions. Researchers have assessed the effect of sex and task type (e.g., cooperation versus competition) on inter-brain coherence during human-to-human interactions. However, no work has yet used deep learning-based approaches to extract insights into sex and task-based differences in an fNIRS hyperscanning context. This work proposes a convolutional neural network-based approach to dyadic sex composition and task classification for an extensive hyperscanning dataset with $N = 222$ participants. Inter-brain signal similarity computed using dynamic time warping is used as the input data. The proposed approach achieves a maximum classification accuracy of greater than $80$ percent, thereby providing a new avenue for exploring and understanding complex brain behavior.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
OVERT: An Algorithm for Safety Verification of Neural Network Control Policies for Nonlinear Systems
Authors:
Chelsea Sidrane,
Amir Maleki,
Ahmed Irfan,
Mykel J. Kochenderfer
Abstract:
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas fro…
▽ More
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas from the classical formal methods literature with ideas from the newer neural network verification literature. The central concept of OVERT is to abstract nonlinear functions with a set of optimally tight piecewise linear bounds. Such piecewise linear bounds are designed for seamless integration into ReLU neural network verification tools. OVERT can be used to prove bounded-time safety properties by either computing reachable sets or solving feasibility queries directly. We demonstrate various examples of safety verification for several classical benchmark examples. OVERT compares favorably to existing methods both in computation time and in tightness of the reachable set.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Runtime Safety Assurance Using Reinforcement Learning
Authors:
Christopher Lazarus,
James G. Lopez,
Mykel J. Kochenderfer
Abstract:
The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specifie…
▽ More
The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
A Maximum Independent Set Method for Scheduling Earth Observing Satellite Constellations
Authors:
Duncan Eddy,
Mykel J. Kochenderfer
Abstract:
Operating Earth observing satellites requires efficient planning methods that coordinate activities of multiple spacecraft. The satellite task planning problem entails selecting actions that best satisfy mission objectives for autonomous execution. Task scheduling is often performed by human operators assisted by heuristic or rule-based planning tools. This approach does not efficiently scale to m…
▽ More
Operating Earth observing satellites requires efficient planning methods that coordinate activities of multiple spacecraft. The satellite task planning problem entails selecting actions that best satisfy mission objectives for autonomous execution. Task scheduling is often performed by human operators assisted by heuristic or rule-based planning tools. This approach does not efficiently scale to multiple assets as heuristics frequently fail to properly coordinate actions of multiple vehicles over long horizons. Additionally, the problem becomes more difficult to solve for large constellations as the complexity of the problem scales exponentially in the number of requested observations and linearly in the number of spacecraft. It is expected that new commercial optical and radar imaging constellations will require automated planning methods to meet stated responsiveness and throughput objectives. This paper introduces a new approach for solving the satellite scheduling problem by generating an infeasibility-based graph representation of the problem and finding a maximal independent set of vertices for the graph. The approach is tested on a scenarios of up to 10,000 requested imaging locations for the Skysat constellation of optical satellites as well as simulated constellations of up to 24 satellites. Performance is compared with contemporary graph-traversal and mixed-integer linear programming approaches. Empirical results demonstrate improvements in both the solution time along with the number of scheduled collections beyond baseline methods. For large problems, the maximum independent set approach is able find a feasible schedule with 8% more collections in 75% less time.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM
Authors:
Kunal Menda,
Jean de Becdelièvre,
Jayesh K. Gupta,
Ilan Kroo,
Mykel J. Kochenderfer,
Zachary Manchester
Abstract:
System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We f…
▽ More
System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We formulate certainty-equivalent expectation-maximization as block coordinate-ascent, and provide an efficient implementation. The algorithm is tested on a simulated system of coupled Lorenz attractors, demonstrating its ability to identify high-dimensional systems that can be intractable for particle-based approaches. Our approach is also used to identify the dynamics of an aerobatic helicopter. By augmenting the state with unobserved fluid states, a model is learned that predicts the acceleration of the helicopter better than state-of-the-art approaches. The codebase for this work is available at https://github.com/sisl/CEEM.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
A Taxonomy and Review of Algorithms for Modeling and Predicting Human Driver Behavior
Authors:
Kyle Brown,
Katherine Driggs-Campbell,
Mykel J. Kochenderfer
Abstract:
We present a review and taxonomy of 200 models from the literature on driver behavior modeling. We begin by introducing a mathematical framework for describing the dynamics of interactive multi-agent traffic. Based on the partially observable stochastic game, this framework provides a basis for discussing different driver modeling techniques. Our taxonomy is constructed around the core modeling ta…
▽ More
We present a review and taxonomy of 200 models from the literature on driver behavior modeling. We begin by introducing a mathematical framework for describing the dynamics of interactive multi-agent traffic. Based on the partially observable stochastic game, this framework provides a basis for discussing different driver modeling techniques. Our taxonomy is constructed around the core modeling tasks of state estimation, intention estimation, trait estimation, and motion prediction, and also discusses the auxiliary tasks of risk estimation, anomaly detection, behavior imitation and microscopic traffic simulation. Existing driver models are categorized based on the specific tasks they address and key attributes of their approach.
△ Less
Submitted 28 November, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
A Survey of Algorithms for Black-Box Safety Validation of Cyber-Physical Systems
Authors:
Anthony Corso,
Robert J. Moss,
Mark Koren,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a blac…
▽ More
Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box operating in a simulated environment. Safety validation tasks include finding disturbances in the environment that cause the system to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques for CPS with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, which are common for CPS. A brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools.
△ Less
Submitted 14 October, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Structured Mechanical Models for Robot Learning and Control
Authors:
Jayesh K. Gupta,
Kunal Menda,
Zachary Manchester,
Mykel J. Kochenderfer
Abstract:
Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer from data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechani…
▽ More
Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer from data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechanical systems that are data-efficient, easily amenable to prior knowledge, and easily usable with model-based control techniques. The goal of this work is to demonstrate the benefits of using Structured Mechanical Models in lieu of black-box neural networks when modeling robot dynamics. We demonstrate that they generalize better from limited data and yield more reliable model-based controllers on a variety of simulated robotic domains.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Scalable Autonomous Vehicle Safety Validation through Dynamic Programming and Scene Decomposition
Authors:
Anthony Corso,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over f…
▽ More
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over failures of an autonomous policy using approximate dynamic programming. Knowledge of this distribution allows for the efficient discovery of many failure examples. To address the problem of scalability, we decompose complex driving scenarios into subproblems consisting of only the ego vehicle and one other vehicle. These subproblems can be solved with approximate dynamic programming and their solutions are recombined to approximate the solution to the full scenario. We apply our approach to a simple two-vehicle scenario to demonstrate the technique as well as a more complex five-vehicle scenario to demonstrate scalability. In both experiments, we observed an increase in the number of failures discovered compared to baseline approaches.
△ Less
Submitted 26 June, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
The Adaptive Stress Testing Formulation
Authors:
Mark Koren,
Anthony Corso,
Mykel J. Kochenderfer
Abstract:
Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide thre…
▽ More
Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide three examples of validation problems formulated to work with AST.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Adaptive Stress Testing without Domain Heuristics using Go-Explore
Authors:
Mark Koren,
Mykel J. Kochenderfer
Abstract:
Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that gui…
▽ More
Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that guide it away from failures. For example, some approaches give rewards for taking more-likely actions, because we want to find more-likely failures. However, the agent may then learn to only take likely actions, and may not be able to find a failure at all. Consequently, the problem becomes a hard-exploration problem, where rewards do not aid exploration. A new algorithm, go-explore (GE), has recently set new records on benchmarks from the hard-exploration field. We apply GE to adaptive stress testing (AST), one example of an RL-based falsification approach that provides a way to search for the most-likely failure scenario. We simulate a scenario where an autonomous vehicle drives while a pedestrian is crossing the road. We demonstrate that GE is able to find failures without domain-specific heuristics, such as the distance between the car and the pedestrian, on scenarios that other RL techniques are unable to solve. Furthermore, inspired by the robustification phase of GE, we demonstrate that the backwards algorithm (BA) improves the failures found by other RL techniques.
△ Less
Submitted 18 June, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Validation of Image-Based Neural Network Controllers through Adaptive Stress Testing
Authors:
Kyle D. Julian,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
Neural networks have become state-of-the-art for computer vision problems because of their ability to efficiently model complex functions from large amounts of data. While neural networks can be shown to perform well empirically for a variety of tasks, their performance is difficult to guarantee. Neural network verification tools have been developed that can certify robustness with respect to a gi…
▽ More
Neural networks have become state-of-the-art for computer vision problems because of their ability to efficiently model complex functions from large amounts of data. While neural networks can be shown to perform well empirically for a variety of tasks, their performance is difficult to guarantee. Neural network verification tools have been developed that can certify robustness with respect to a given input image; however, for neural network systems used in closed-loop controllers, robustness with respect to individual images does not address multi-step properties of the neural network controller and its environment. Furthermore, neural network systems interacting in the physical world and using natural images are operating in a black-box environment, making formal verification intractable. This work combines the adaptive stress testing (AST) framework with neural network verification tools to search for the most likely sequence of image disturbances that cause the neural network controlled system to reach a failure. An autonomous aircraft taxi application is presented, and results show that the AST method finds failures with more likely image disturbances than baseline methods. Further analysis of AST results revealed an explainable cause of the failure, giving insight into the problematic scenarios that should be addressed.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Guaranteeing Safety for Neural Network-Based Aircraft Collision Avoidance Systems
Authors:
Kyle D. Julian,
Mykel J. Kochenderfer
Abstract:
The decision logic for the ACAS X family of aircraft collision avoidance systems is represented as a large numeric table. Due to storage constraints of certified avionics hardware, neural networks have been suggested as a way to significantly compress the data while still preserving performance in terms of safety. However, neural networks are complex continuous functions with outputs that are diff…
▽ More
The decision logic for the ACAS X family of aircraft collision avoidance systems is represented as a large numeric table. Due to storage constraints of certified avionics hardware, neural networks have been suggested as a way to significantly compress the data while still preserving performance in terms of safety. However, neural networks are complex continuous functions with outputs that are difficult to predict. Because simulations evaluate only a finite number of encounters, simulations are not sufficient to guarantee that the neural network will perform correctly in all possible situations. We propose a method to provide safety guarantees when using a neural network collision avoidance system. The neural network outputs are bounded using neural network verification tools like Reluplex and Reluval, and a reachability method determines all possible ways aircraft encounters will resolve using neural network advisories and assuming bounded aircraft dynamics. Experiments with systems inspired by ACAS X show that neural networks giving either horizontal or vertical maneuvers can be proven safe. We explore how relaxing the bounds on aircraft dynamics can lead to potentially unsafe encounters and demonstrate how neural network controllers can be modified to guarantee safety through online costs or lowering alerting cost. The reachability method is flexible and can incorporate uncertainties such as pilot delay and sensor error. These results suggest a method for certifying neural network collision avoidance systems for use in real aircraft.
△ Less
Submitted 5 May, 2020; v1 submitted 15 December, 2019;
originally announced December 2019.
-
Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
Authors:
Anthony Corso,
Peter Du,
Katherine Driggs-Campbell,
Mykel J. Kochenderfer
Abstract:
Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the mos…
▽ More
Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the most likely failure scenarios as a Markov decision process, which can be solved using reinforcement learning. In practice, AST tends to find scenarios where failure is unavoidable and tends to repeatedly discover the same types of failures of a system. This work addresses these issues by encoding domain relevant information into the search procedure. With this modification, the AST method discovers a larger and more expressive subset of the failure space when compared to the original AST formulation. We show that our approach is able to identify useful failure scenarios of an autonomous vehicle policy.
△ Less
Submitted 6 August, 2019; v1 submitted 2 August, 2019;
originally announced August 2019.
-
Rethinking System Health Management
Authors:
Edward Balaban,
Stephen B. Johnson,
Mykel J. Kochenderfer
Abstract:
Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper…
▽ More
Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system's operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts.
△ Less
Submitted 10 March, 2019;
originally announced March 2019.
-
Verifying Aircraft Collision Avoidance Neural Networks Through Linear Approximations of Safe Regions
Authors:
Kyle D. Julian,
Shivam Sharma,
Jean-Baptiste Jeannin,
Mykel J. Kochenderfer
Abstract:
The next generation of aircraft collision avoidance systems frame the problem as a Markov decision process and use dynamic programming to optimize the alerting logic. The resulting system uses a large lookup table to determine advisories given to pilots, but these tables can grow very large. To enable the system to operate on limited hardware, prior work investigated compressing the table using a…
▽ More
The next generation of aircraft collision avoidance systems frame the problem as a Markov decision process and use dynamic programming to optimize the alerting logic. The resulting system uses a large lookup table to determine advisories given to pilots, but these tables can grow very large. To enable the system to operate on limited hardware, prior work investigated compressing the table using a deep neural network. However, ensuring that the neural network reliably issues safe advisories is important for certification. This work defines linearized regions where each advisory can be safely provided, allowing Reluplex, a neural network verification tool, to check if unsafe advisories are ever issued. A notional collision avoidance policy is generated and used to train a neural network representation. The neural networks are checked for unsafe advisories, resulting in the discovery of thousands of unsafe counterexamples.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.
-
A Reachability Method for Verifying Dynamical Systems with Deep Neural Network Controllers
Authors:
Kyle D. Julian,
Mykel J. Kochenderfer
Abstract:
Deep neural networks can be trained to be efficient and effective controllers for dynamical systems; however, the mechanics of deep neural networks are complex and difficult to guarantee. This work presents a general approach for providing guarantees for deep neural network controllers over multiple time steps using a combination of reachability methods and open source neural network verification…
▽ More
Deep neural networks can be trained to be efficient and effective controllers for dynamical systems; however, the mechanics of deep neural networks are complex and difficult to guarantee. This work presents a general approach for providing guarantees for deep neural network controllers over multiple time steps using a combination of reachability methods and open source neural network verification tools. By bounding the system dynamics and neural network outputs, the set of reachable states can be over-approximated to provide a guarantee that the system will never reach states outside the set. The method is demonstrated on the mountain car problem as well as an aircraft collision avoidance problem. Results show that this approach can provide neural network guarantees given a bounded dynamic model.
△ Less
Submitted 3 June, 2019; v1 submitted 1 March, 2019;
originally announced March 2019.
-
A General Framework for Structured Learning of Mechanical Systems
Authors:
Jayesh K. Gupta,
Kunal Menda,
Zachary Manchester,
Mykel J. Kochenderfer
Abstract:
Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is avail…
▽ More
Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency.
△ Less
Submitted 1 March, 2019; v1 submitted 22 February, 2019;
originally announced February 2019.
-
Using Neural Networks to Generate Information Maps for Mobile Sensors
Authors:
Louis Dressel,
Mykel J. Kochenderfer
Abstract:
Target localization is a critical task for mobile sensors and has many applications. However, generating informative trajectories for these sensors is a challenging research problem. A common method uses information maps that estimate the value of taking measurements from any point in the sensor state space. These information maps are used to generate trajectories; for example, a trajectory might…
▽ More
Target localization is a critical task for mobile sensors and has many applications. However, generating informative trajectories for these sensors is a challenging research problem. A common method uses information maps that estimate the value of taking measurements from any point in the sensor state space. These information maps are used to generate trajectories; for example, a trajectory might be designed so its distribution of measurements matches the distribution of the information map. Regardless of the trajectory generation method, generating information maps as new observations are made is critical. However, it can be challenging to compute these maps in real-time. We propose using convolutional neural networks to generate information maps from a target estimate and sensor model in real-time. Simulations show that maps are accurately rendered while offering orders of magnitude reduction in computation time.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
On the Optimality of Ergodic Trajectories for Information Gathering Tasks
Authors:
Louis Dressel,
Mykel J. Kochenderfer
Abstract:
Recently, ergodic control has been suggested as a means to guide mobile sensors for information gathering tasks. In ergodic control, a mobile sensor follows a trajectory that is ergodic with respect to some information density distribution. A trajectory is ergodic if time spent in a state space region is proportional to the information density of the region. Although ergodic control has shown prom…
▽ More
Recently, ergodic control has been suggested as a means to guide mobile sensors for information gathering tasks. In ergodic control, a mobile sensor follows a trajectory that is ergodic with respect to some information density distribution. A trajectory is ergodic if time spent in a state space region is proportional to the information density of the region. Although ergodic control has shown promising experimental results, there is little understanding of why it works or when it is optimal. In this paper, we study a problem class under which optimal information gathering trajectories are ergodic. This class relies on a submodularity assumption for repeated measurements from the same state. It is assumed that information available in a region decays linearly with time spent there. This assumption informs selection of the horizon used in ergodic trajectory generation. We support our claims with a set of experiments that demonstrate the link between ergodicity, optimal information gathering, and submodularity.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning
Authors:
Patrick Slade,
Zachary N. Sunberg,
Mykel J. Kochenderfer
Abstract:
Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploi…
▽ More
Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploitation. The specific problem setting considered here is for discrete-time nonlinear systems, with process noise, input-constraints, and parameter uncertainty. This article frames this problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search with an unscented Kalman filter to account for process noise and parameter uncertainty. This method is compared with certainty equivalent model predictive control and a tree search method that approximates the QMDP solution, providing insight into when information gathering is useful. Discrete time simulations characterize performance over a range of process noise and bounds on unknown parameters. An offline optimization method is used to select the Monte Carlo tree search parameters without hand-tuning. In lieu of recursive feasibility guarantees, a probabilistic bounding heuristic is offered that increases the probability of keeping the state within a desired region.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.