Search | arXiv e-print repository

Model Identification Adaptive Control with $ρ$-POMDP Planning

Authors: Michelle Ho, Arec Jamgochian, Mykel J. Kochenderfer

Abstract: Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($ρ$-P… ▽ More Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($ρ$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters. △ Less

Submitted 22 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

Comments: Accepted to CoDIT 2025

arXiv:2503.22660 [pdf, ps, other]

Verifying Nonlinear Neural Feedback Systems using Polyhedral Enclosures

Authors: Samuel I. Akinwande, Chelsea Sidrane, Mykel J. Kochenderfer, Clark Barrett

Abstract: As dynamical systems equipped with neural network controllers (neural feedback systems) become increasingly prevalent, it is critical to develop methods to ensure their safe operation. Verifying safety requires extending control theoretic analysis methods to these systems. Although existing techniques can efficiently handle linear neural feedback systems, relatively few scalable methods address th… ▽ More As dynamical systems equipped with neural network controllers (neural feedback systems) become increasingly prevalent, it is critical to develop methods to ensure their safe operation. Verifying safety requires extending control theoretic analysis methods to these systems. Although existing techniques can efficiently handle linear neural feedback systems, relatively few scalable methods address the nonlinear case. We propose a novel algorithm for forward reachability analysis of nonlinear neural feedback systems. The approach leverages the structure of the nonlinear transition functions of the systems to compute tight polyhedral enclosures (i.e., abstractions). These enclosures, combined with the neural controller, are then encoded as a mixed-integer linear program (MILP). Optimizing this MILP yields a sound over-approximation of the forward-reachable set. We evaluate our algorithm on representative benchmarks and demonstrate an order of magnitude improvement over the current state of the art. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2501.16625 [pdf, other]

An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models

Authors: Alexandros E. Tzikas, Mykel J. Kochenderfer

Abstract: We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order… ▽ More We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models. Our approach only requires input-output data from the system and first-order information from the model with respect to the parameters. Our algorithm consists of two modules. First, we formulate the problem of system identification from a Bayesian perspective and use a linear Gaussian model approximation to iteratively optimize the model's parameters. In each iteration, we propose to use the input-output data to tune the covariance of the linear Gaussian model. This statistically calibrates the approach. Secondly, we define a Gaussian-based uncertainty measure for the model parameters, which we can then minimize with respect to the next selected input. We test our method with linear and nonlinear dynamics. △ Less

Submitted 30 March, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

Comments: Submitted to the IEEE CDC

ACM Class: G.3; I.6

arXiv:2412.06220 [pdf, other]

Discrete-Time Distribution Steering using Monte Carlo Tree Search

Authors: Alexandros E. Tzikas, Liam A. Kruse, Mansur Arief, Mykel J. Kochenderfer, Stephen Boyd

Abstract: Optimal control problems with state distribution constraints have attracted interest for their expressivity, but solutions rely on linear approximations. We approach the problem of driving the state of a dynamical system in distribution from a sequential decision-making perspective. We formulate the optimal control problem as an appropriate Markov decision process (MDP), where the actions correspo… ▽ More Optimal control problems with state distribution constraints have attracted interest for their expressivity, but solutions rely on linear approximations. We approach the problem of driving the state of a dynamical system in distribution from a sequential decision-making perspective. We formulate the optimal control problem as an appropriate Markov decision process (MDP), where the actions correspond to the state-feedback control policies. We then solve the MDP using Monte Carlo tree search (MCTS). This renders our method suitable for any dynamics model. A key component of our approach is a novel, easy to compute, distance metric in the distribution space that allows our algorithm to guide the distribution of the state. We experimentally test our algorithm under both linear and nonlinear dynamics. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: Submitted to the IEEE Robotics and Automation Letters for possible publication

ACM Class: I.2.9; G.3

arXiv:2411.07971 [pdf, other]

doi 10.1109/CBMS61543.2024.00040

Optimal Control of Mechanical Ventilators with Learned Respiratory Dynamics

Authors: Isaac Ronald Ward, Dylan M. Asmar, Mansur Arief, Jana Krystofova Mike, Mykel J. Kochenderfer

Abstract: Deciding on appropriate mechanical ventilator management strategies significantly impacts the health outcomes for patients with respiratory diseases. Acute Respiratory Distress Syndrome (ARDS) is one such disease that requires careful ventilator operation to be effectively treated. In this work, we frame the management of ventilators for patients with ARDS as a sequential decision making problem u… ▽ More Deciding on appropriate mechanical ventilator management strategies significantly impacts the health outcomes for patients with respiratory diseases. Acute Respiratory Distress Syndrome (ARDS) is one such disease that requires careful ventilator operation to be effectively treated. In this work, we frame the management of ventilators for patients with ARDS as a sequential decision making problem using the Markov decision process framework. We implement and compare controllers based on clinical guidelines contained in the ARDSnet protocol, optimal control theory, and learned latent dynamics represented as neural networks. The Pulse Physiology Engine's respiratory dynamics simulator is used to establish a repeatable benchmark, gather simulated data, and quantitatively compare these controllers. We score performance in terms of measured improvement in established ARDS health markers (pertaining to improved respiratory rate, oxygenation, and vital signs). Our results demonstrate that techniques leveraging neural networks and optimal control can automatically discover effective ventilation management strategies without access to explicit ventilator management procedures or guidelines (such as those defined in the ARDSnet protocol). △ Less

Submitted 12 November, 2024; originally announced November 2024.

Comments: 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 7 pages, 3 figures

arXiv:2410.16282 [pdf, other]

Optimal Ground Station Selection for Low-Earth Orbiting Satellites

Authors: Duncan Eddy, Michelle Ho, Mykel J. Kochenderfer

Abstract: This paper presents a solution to the problem of optimal ground station selection for low-Earth orbiting (LEO) space missions that enables mission operators to precisely design their ground segment performance and costs. Space mission operators are increasingly turning to Ground-Station-as-a-Service (GSaaS) providers to supply the terrestrial communications segment to reduce costs and increase net… ▽ More This paper presents a solution to the problem of optimal ground station selection for low-Earth orbiting (LEO) space missions that enables mission operators to precisely design their ground segment performance and costs. Space mission operators are increasingly turning to Ground-Station-as-a-Service (GSaaS) providers to supply the terrestrial communications segment to reduce costs and increase network size. However, this approach leads to a new challenge of selecting the optimal service providers and station locations for a given mission. We consider the problem of ground station selection as an optimization problem and present a general solution framework that allows mission designers to set their overall optimization objective and constrain key mission performance variables such as total data downlink, total mission cost, recurring operational cost, and maximum communications time-gap. We solve the problem using integer programming (IP). To address computational scaling challenges, we introduce a surrogate optimization approach where the optimal station selection is determined based on solving the problem over a reduced time domain. Two different IP formulations are evaluated using randomized selections of LEO satellites of varying constellation sizes. We consider the networks of the commercial GSaaS providers Atlas Space Operations, Amazon Web Services (AWS) Ground Station, Azure Orbital Ground Station, Kongsberg Satellite Services (KSAT), Leaf Space, and Viasat Real-Time Earth. We compare our results against standard operational practices of integrating with one or two primary ground station providers. △ Less

Submitted 1 March, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

Comments: 13 pages, 3 tables, 4 figures, presented at IEEE Aeroconf 2025

arXiv:2409.13088 [pdf, other]

Informative Input Design for Dynamic Mode Decomposition

Authors: Joshua Ott, Mykel J. Kochenderfer, Stephen Boyd

Abstract: Efficiently estimating system dynamics from data is essential for minimizing data collection costs and improving model performance. This work addresses the challenge of designing future control inputs to maximize information gain, thereby improving the efficiency of the system identification process. We propose an approach that integrates informative input design into the Dynamic Mode Decompositio… ▽ More Efficiently estimating system dynamics from data is essential for minimizing data collection costs and improving model performance. This work addresses the challenge of designing future control inputs to maximize information gain, thereby improving the efficiency of the system identification process. We propose an approach that integrates informative input design into the Dynamic Mode Decomposition with control (DMDc) framework, which is well-suited for high-dimensional systems. By formulating an approximate convex optimization problem that minimizes the trace of the estimation error covariance matrix, we are able to efficiently reduce uncertainty in the model parameters while respecting constraints on the system states and control inputs. This method outperforms traditional techniques like Pseudo-Random Binary Sequences (PRBS) and orthogonal multisines, which do not adapt to the current system model and often gather redundant information. We validate our approach using aircraft and fluid dynamics simulations to demonstrate the practical applicability and effectiveness of our method. Our results show that strategically planning control inputs based on the current model enhances the accuracy of system identification while requiring less data. Furthermore, we provide our implementation and simulation interfaces as an open-source software package, facilitating further research development and use by industry practitioners. △ Less

Submitted 28 April, 2025; v1 submitted 19 September, 2024; originally announced September 2024.

Comments: Accepted to L4DC 2025

arXiv:2409.08097 [pdf, other]

Optimizing Falsification for Learning-Based Control Systems: A Multi-Fidelity Bayesian Approach

Authors: Zahra Shahrooei, Mykel J. Kochenderfer, Ali Baheri

Abstract: Testing controllers in safety-critical systems is vital for ensuring their safety and preventing failures. In this paper, we address the falsification problem within learning-based closed-loop control systems through simulation. This problem involves the identification of counterexamples that violate system safety requirements and can be formulated as an optimization task based on these requiremen… ▽ More Testing controllers in safety-critical systems is vital for ensuring their safety and preventing failures. In this paper, we address the falsification problem within learning-based closed-loop control systems through simulation. This problem involves the identification of counterexamples that violate system safety requirements and can be formulated as an optimization task based on these requirements. Using full-fidelity simulator data in this optimization problem can be computationally expensive. To improve efficiency, we propose a multi-fidelity Bayesian optimization falsification framework that harnesses simulators with varying levels of accuracy. Our proposed framework can transition between different simulators and establish meaningful relationships between them. Through multi-fidelity Bayesian optimization, we determine both the optimal system input likely to be a counterexample and the appropriate fidelity level for assessment. We evaluated our approach across various Gym environments, each featuring different levels of fidelity. Our experiments demonstrate that multi-fidelity Bayesian optimization is more computationally efficient than full-fidelity Bayesian optimization and other baseline methods in detecting counterexamples. A Python implementation of the algorithm is available at https://github.com/SAILRIT/MFBO_Falsification. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 13 pages, 9 figures

arXiv:2408.13847 [pdf, other]

Watercraft as Overwater Ambulance Exchange Points to Enhance Aeromedical Evacuation

Authors: Mahdi Al-Husseini, Kyle H. Wray, Mykel J. Kochenderfer

Abstract: Ambulance exchange points are preidentified sites where patients are transferred between evacuation platforms while en route to enhanced medical care. We propose a new capability for maritime medical evacuation, which involves co-opting underway watercraft as overwater ambulance exchange points to transfer patients between medical evacuation aircraft. We partner with the United States Army's 25th… ▽ More Ambulance exchange points are preidentified sites where patients are transferred between evacuation platforms while en route to enhanced medical care. We propose a new capability for maritime medical evacuation, which involves co-opting underway watercraft as overwater ambulance exchange points to transfer patients between medical evacuation aircraft. We partner with the United States Army's 25th Combat Aviation Brigade to demonstrate the use of an Army watercraft as an overwater ambulance exchange point. A manikin is transferred between two HH-60 Medical Evacuation Black Hawk helicopters conducting hoist operations over Army Logistics Support Vessel 3, which is traveling south of Honolulu, Hawaii. The demonstration is enabled by a decision support system for dispatching aircraft, hoist stabilization technology, commercial satellite internet, military geospatial infrastructure applications, and digital medical documentation tools, the benefits of which are all discussed. Three extensions of the overwater ambulance exchange point are introduced and civilian applications are considered. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2406.14761 [pdf, other]

Diffusion-Based Failure Sampling for Evaluating Safety-Critical Autonomous Systems

Authors: Harrison Delecki, Marc R. Schlichting, Mansur Arief, Anthony Corso, Marcell Vazquez-Chanlatte, Mykel J. Kochenderfer

Abstract: Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample… ▽ More Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques. △ Less

Submitted 20 May, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Appears in IEEE International Conference on Engineering Reliable Autonomous Systems (ERAS) 2025

arXiv:2401.10949 [pdf, ps, other]

The Synergy Between Optimal Transport Theory and Multi-Agent Reinforcement Learning

Authors: Ali Baheri, Mykel J. Kochenderfer

Abstract: This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent str… ▽ More This paper explores the integration of optimal transport (OT) theory with multi-agent reinforcement learning (MARL). This integration uses OT to handle distributions and transportation problems to enhance the efficiency, coordination, and adaptability of MARL. There are five key areas where OT can impact MARL: (1) policy alignment, where OT's Wasserstein metric is used to align divergent agent strategies towards unified goals; (2) distributed resource management, employing OT to optimize resource allocation among agents; (3) addressing non-stationarity, using OT to adapt to dynamic environmental shifts; (4) scalable multi-agent learning, harnessing OT for decomposing large-scale learning objectives into manageable tasks; and (5) enhancing energy efficiency, applying OT principles to develop sustainable MARL systems. This paper articulates how the synergy between OT and MARL can address scalability issues, optimize resource distribution, align agent policies in cooperative environments, and ensure adaptability in dynamically changing conditions. △ Less

Submitted 24 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2309.12474 [pdf, other]

SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning

Authors: Marc R. Schlichting, Nina V. Boord, Anthony L. Corso, Mykel J. Kochenderfer

Abstract: Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian appr… ▽ More Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator. △ Less

Submitted 30 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Accepted for ITSC 2023

arXiv:2305.06111 [pdf, ps, other]

Joint Falsification and Fidelity Settings Optimization for Validation of Safety-Critical Systems: A Theoretical Analysis

Authors: Ali Baheri, Mykel J. Kochenderfer

Abstract: Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting th… ▽ More Safety validation is a crucial component in the development and deployment of autonomous systems, such as self-driving vehicles and robotic systems. Ensuring safe operation necessitates extensive testing and verification of control policies, typically conducted in simulation environments. High-fidelity simulators accurately model real-world dynamics but entail high computational costs, limiting their scalability for exhaustive testing. Conversely, low-fidelity simulators offer efficiency but may not capture the intricacies of high-fidelity simulators, potentially yielding false conclusions. We propose a joint falsification and fidelity optimization framework for safety validation of autonomous systems. Our mathematical formulation combines counterexample searches with simulator fidelity improvement, facilitating more efficient exploration of the critical environmental configurations challenging the control system. Our contributions encompass a set of theorems addressing counterexample sensitivity analysis, sample complexity, convergence, the interplay between the outer and inner optimization loops, and regret bound analysis. The proposed joint optimization approach enables a more targeted and efficient testing process, optimizes the use of available computational resources, and enhances confidence in autonomous system safety validation. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: Submitted to the 20th International Conference on Quantitative Evaluation of Systems (QEST 2023)

arXiv:2304.09352 [pdf, other]

Optimizing Carbon Storage Operations for Long-Term Safety

Authors: Yizheng Wang, Markus Zechner, Gege Wen, Anthony Louis Corso, John Michael Mern, Mykel J. Kochenderfer, Jef Karel Caers

Abstract: To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov… ▽ More To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov decision process (POMDP). We solve the POMDP using belief state planning to optimize injector and monitoring well locations, with the goal of maximizing stored CO2 while maintaining safety. Empirical results in simulation demonstrate that our approach is effective in ensuring safe long-term carbon storage operations. We showcase the flexibility of our approach by introducing three different monitoring strategies and examining their impact on decision quality. Additionally, we introduce a neural network surrogate model for the POMDP decision-making process to handle the complex dynamics of the multi-phase flow. We also investigate the effects of different fidelity levels of the surrogate model on decision qualities. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2212.14118 [pdf, other]

Falsification of Learning-Based Controllers through Multi-Fidelity Bayesian Optimization

Authors: Zahra Shahrooei, Mykel J. Kochenderfer, Ali Baheri

Abstract: Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety speci… ▽ More Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety specification as an optimization problem. We propose a multi-fidelity falsification framework using Bayesian optimization, which is able to determine at which level of fidelity we should conduct a safety evaluation in addition to finding possible instances from the environment that cause the system to fail. This method allows us to automatically switch between inexpensive, inaccurate information from a low-fidelity simulator and expensive, accurate information from a high-fidelity simulator in a cost-effective way. Our experiments on various environments in simulation demonstrate that multi-fidelity Bayesian optimization has falsification performance comparable to single-fidelity Bayesian optimization but with much lower cost. △ Less

Submitted 28 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: 7 pages, 8 figures, Accepted for the 2023 European Control Conference (ECC)

arXiv:2210.05015 [pdf, other]

doi 10.1613/jair.1.14525

Optimality Guarantees for Particle Belief Approximation of POMDPs

Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w… ▽ More Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers. △ Less

Submitted 19 October, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Journal ref: Journal of Artificial Intelligence Research, 77, 1591-1636 (2023)

arXiv:2209.14076 [pdf, other]

Backward Reachability Analysis of Neural Feedback Loops: Techniques for Linear and Nonlinear Systems

Authors: Nicholas Rober, Sydney M. Katz, Chelsea Sidrane, Esen Yel, Michael Everett, Mykel J. Kochenderfer, Jonathan P. How

Abstract: As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have… ▽ More As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have been developed for systems without NN components, the nonlinearities in NN activation functions and general noninvertibility of NN weight matrices make backward reachability for NFLs a challenging problem. To avoid the difficulties associated with propagating sets backward through NNs, we introduce a framework that leverages standard forward NN analysis tools to efficiently find over-approximations to backprojection (BP) sets, i.e., sets of states for which an NN policy will lead a system to a given target set. We present frameworks for calculating BP over approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system. △ Less

Submitted 21 November, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: 17 pages, 15 figures. Journal extension of arXiv:2204.08319

arXiv:2204.14250 [pdf, other]

Collision Risk and Operational Impact of Speed Change Advisories as Aircraft Collision Avoidance Maneuvers

Authors: Sydney M. Katz, Luis E. Alvarez, Michael Owen, Samuel Wu, Marc Brittain, Anshuman Das, Mykel J. Kochenderfer

Abstract: Aircraft collision avoidance systems have long been a key factor in keeping our airspace safe. Over the past decade, the FAA has supported the development of a new family of collision avoidance systems called the Airborne Collision Avoidance System X (ACAS X), which model the collision avoidance problem as a Markov decision process (MDP). Variants of ACAS X have been created for both manned (ACAS… ▽ More Aircraft collision avoidance systems have long been a key factor in keeping our airspace safe. Over the past decade, the FAA has supported the development of a new family of collision avoidance systems called the Airborne Collision Avoidance System X (ACAS X), which model the collision avoidance problem as a Markov decision process (MDP). Variants of ACAS X have been created for both manned (ACAS Xa) and unmanned aircraft (ACAS Xu and ACAS sXu). The variants primarily differ in the types of collision avoidance maneuvers they issue. For example, ACAS Xa issues vertical collision avoidance advisories, while ACAS Xu and ACAS sXu allow for horizontal advisories due to reduced aircraft performance capabilities. Currently, a new variant of ACAS X, called ACAS Xr, is being developed to provide collision avoidance capability to rotorcraft and Advanced Air Mobility (AAM) vehicles. Due to the desire to minimize deviation from the prescribed flight path of these aircraft, speed adjustments have been proposed as a potential collision avoidance maneuver for aircraft using ACAS Xr. In this work, we investigate the effect of speed change advisories on the safety and operational efficiency of collision avoidance systems. We develop an MDP-based collision avoidance logic that issues speed advisories and compare its performance to that of horizontal and vertical logics through Monte Carlo simulation on existing airspace encounter models. Our results show that while speed advisories are able to reduce collision risk, they are neither as safe nor as efficient as their horizontal and vertical counterparts. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: 10 pages, 6 figures, presented at the 2022 AIAA Aviation Forum

arXiv:2203.16633 [pdf, other]

Model Predictive Optimized Path Integral Strategies

Authors: Dylan M. Asmar, Ransalu Senanayake, Shawn Manuel, Mykel J. Kochenderfer

Abstract: We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost… ▽ More We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost functions. The benefit of optimizing the proposal distribution by integrating AIS at each control step is demonstrated in simulated environments including controlling multiple cars around a track. The new algorithm is more sample efficient than MPPI, achieving better performance with fewer samples. This performance disparity grows as the dimension of the action space increases. Results from simulations suggest the new algorithm can be used as an anytime algorithm, increasing the value of control at each iteration versus relying on a large set of samples. △ Less

Submitted 1 March, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: Repository: https://github.com/sisl/MPOPIS. Accepted to ICRA 2023

ACM Class: I.2.8; I.2.9

arXiv:2112.03911 [pdf, ps, other]

Dyadic Sex Composition and Task Classification Using fNIRS Hyperscanning Data

Authors: Liam A. Kruse, Allan L. Reiss, Mykel J. Kochenderfer, Stephanie Balters

Abstract: Hyperscanning with functional near-infrared spectroscopy (fNIRS) is an emerging neuroimaging application that measures the nuanced neural signatures underlying social interactions. Researchers have assessed the effect of sex and task type (e.g., cooperation versus competition) on inter-brain coherence during human-to-human interactions. However, no work has yet used deep learning-based approaches… ▽ More Hyperscanning with functional near-infrared spectroscopy (fNIRS) is an emerging neuroimaging application that measures the nuanced neural signatures underlying social interactions. Researchers have assessed the effect of sex and task type (e.g., cooperation versus competition) on inter-brain coherence during human-to-human interactions. However, no work has yet used deep learning-based approaches to extract insights into sex and task-based differences in an fNIRS hyperscanning context. This work proposes a convolutional neural network-based approach to dyadic sex composition and task classification for an extensive hyperscanning dataset with $N = 222$ participants. Inter-brain signal similarity computed using dynamic time warping is used as the input data. The proposed approach achieves a maximum classification accuracy of greater than $80$ percent, thereby providing a new avenue for exploring and understanding complex brain behavior. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: 20th IEEE International Conference on Machine Learning and Applications

arXiv:2108.01220 [pdf, ps, other]

OVERT: An Algorithm for Safety Verification of Neural Network Control Policies for Nonlinear Systems

Authors: Chelsea Sidrane, Amir Maleki, Ahmed Irfan, Mykel J. Kochenderfer

Abstract: Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas fro… ▽ More Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas from the classical formal methods literature with ideas from the newer neural network verification literature. The central concept of OVERT is to abstract nonlinear functions with a set of optimally tight piecewise linear bounds. Such piecewise linear bounds are designed for seamless integration into ReLU neural network verification tools. OVERT can be used to prove bounded-time safety properties by either computing reachable sets or solving feasibility queries directly. We demonstrate various examples of safety verification for several classical benchmark examples. OVERT compares favorably to existing methods both in computation time and in tightness of the reachable set. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 44 pages, under review

MSC Class: 68Q60 (Primary) 68T07; 37N35 (Secondary) ACM Class: I.2.6; I.2.8; D.2.4

Journal ref: Journal of Machine Learning Research 23 (2022) 1-45

arXiv:2010.10618 [pdf, other]

Runtime Safety Assurance Using Reinforcement Learning

Authors: Christopher Lazarus, James G. Lopez, Mykel J. Kochenderfer

Abstract: The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specifie… ▽ More The airworthiness and safety of a non-pedigreed autopilot must be verified, but the cost to formally do so can be prohibitive. We can bypass formal verification of non-pedigreed components by incorporating Runtime Safety Assurance (RTSA) as mechanism to ensure safety. RTSA consists of a meta-controller that observes the inputs and outputs of a non-pedigreed component and verifies formally specified behavior as the system operates. When the system is triggered, a verified recovery controller is deployed. Recovery controllers are designed to be safe but very likely disruptive to the operational objective of the system, and thus RTSA systems must balance safety and efficiency. The objective of this paper is to design a meta-controller capable of identifying unsafe situations with high accuracy. High dimensional and non-linear dynamics in which modern controllers are deployed along with the black-box nature of the nominal controllers make this a difficult problem. Current approaches rely heavily on domain expertise and human engineering. We frame the design of RTSA with the Markov decision process (MDP) framework and use reinforcement learning (RL) to solve it. Our learned meta-controller consistently exhibits superior performance in our experiments compared to our baseline, human engineered approach. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Journal ref: 2020 IEEE/AIAA 39th Digital Avionics Systems Conference (DASC)

arXiv:2008.08446 [pdf, other]

A Maximum Independent Set Method for Scheduling Earth Observing Satellite Constellations

Authors: Duncan Eddy, Mykel J. Kochenderfer

Abstract: Operating Earth observing satellites requires efficient planning methods that coordinate activities of multiple spacecraft. The satellite task planning problem entails selecting actions that best satisfy mission objectives for autonomous execution. Task scheduling is often performed by human operators assisted by heuristic or rule-based planning tools. This approach does not efficiently scale to m… ▽ More Operating Earth observing satellites requires efficient planning methods that coordinate activities of multiple spacecraft. The satellite task planning problem entails selecting actions that best satisfy mission objectives for autonomous execution. Task scheduling is often performed by human operators assisted by heuristic or rule-based planning tools. This approach does not efficiently scale to multiple assets as heuristics frequently fail to properly coordinate actions of multiple vehicles over long horizons. Additionally, the problem becomes more difficult to solve for large constellations as the complexity of the problem scales exponentially in the number of requested observations and linearly in the number of spacecraft. It is expected that new commercial optical and radar imaging constellations will require automated planning methods to meet stated responsiveness and throughput objectives. This paper introduces a new approach for solving the satellite scheduling problem by generating an infeasibility-based graph representation of the problem and finding a maximal independent set of vertices for the graph. The approach is tested on a scenarios of up to 10,000 requested imaging locations for the Skysat constellation of optical satellites as well as simulated constellations of up to 24 satellites. Performance is compared with contemporary graph-traversal and mixed-integer linear programming approaches. Empirical results demonstrate improvements in both the solution time along with the number of scheduled collections beyond baseline methods. For large problems, the maximum independent set approach is able find a feasible schedule with 8% more collections in 75% less time. △ Less

Submitted 15 August, 2020; originally announced August 2020.

arXiv:2006.11615 [pdf, other]

Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM

Authors: Kunal Menda, Jean de Becdelièvre, Jayesh K. Gupta, Ilan Kroo, Mykel J. Kochenderfer, Zachary Manchester

Abstract: System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We f… ▽ More System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We formulate certainty-equivalent expectation-maximization as block coordinate-ascent, and provide an efficient implementation. The algorithm is tested on a simulated system of coupled Lorenz attractors, demonstrating its ability to identify high-dimensional systems that can be intractable for particle-based approaches. Our approach is also used to identify the dynamics of an aerobatic helicopter. By augmenting the state with unobserved fluid states, a model is learned that predicts the acceleration of the helicopter better than state-of-the-art approaches. The codebase for this work is available at https://github.com/sisl/CEEM. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: First three authors contributed equally. Accepted at ICML 2020. Website: https://sites.google.com/stanford.edu/ceem/

arXiv:2006.08832 [pdf, other]

A Taxonomy and Review of Algorithms for Modeling and Predicting Human Driver Behavior

Authors: Kyle Brown, Katherine Driggs-Campbell, Mykel J. Kochenderfer

Abstract: We present a review and taxonomy of 200 models from the literature on driver behavior modeling. We begin by introducing a mathematical framework for describing the dynamics of interactive multi-agent traffic. Based on the partially observable stochastic game, this framework provides a basis for discussing different driver modeling techniques. Our taxonomy is constructed around the core modeling ta… ▽ More We present a review and taxonomy of 200 models from the literature on driver behavior modeling. We begin by introducing a mathematical framework for describing the dynamics of interactive multi-agent traffic. Based on the partially observable stochastic game, this framework provides a basis for discussing different driver modeling techniques. Our taxonomy is constructed around the core modeling tasks of state estimation, intention estimation, trait estimation, and motion prediction, and also discusses the auxiliary tasks of risk estimation, anomaly detection, behavior imitation and microscopic traffic simulation. Existing driver models are categorized based on the specific tasks they address and key attributes of their approach. △ Less

Submitted 28 November, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2005.02979 [pdf, ps, other]

doi 10.1613/jair.1.12716

A Survey of Algorithms for Black-Box Safety Validation of Cyber-Physical Systems

Authors: Anthony Corso, Robert J. Moss, Mark Koren, Ritchie Lee, Mykel J. Kochenderfer

Abstract: Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a blac… ▽ More Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box operating in a simulated environment. Safety validation tasks include finding disturbances in the environment that cause the system to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques for CPS with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, which are common for CPS. A brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools. △ Less

Submitted 14 October, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

Journal ref: Journal of Artificial Intelligence Research, vol. 72, p. 377-428, 2021

arXiv:2004.10301 [pdf, other]

Structured Mechanical Models for Robot Learning and Control

Authors: Jayesh K. Gupta, Kunal Menda, Zachary Manchester, Mykel J. Kochenderfer

Abstract: Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer from data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechani… ▽ More Model-based methods are the dominant paradigm for controlling robotic systems, though their efficacy depends heavily on the accuracy of the model used. Deep neural networks have been used to learn models of robot dynamics from data, but they suffer from data-inefficiency and the difficulty to incorporate prior knowledge. We introduce Structured Mechanical Models, a flexible model class for mechanical systems that are data-efficient, easily amenable to prior knowledge, and easily usable with model-based control techniques. The goal of this work is to demonstrate the benefits of using Structured Mechanical Models in lieu of black-box neural networks when modeling robot dynamics. We demonstrate that they generalize better from limited data and yield more reliable model-based controllers on a variety of simulated robotic domains. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: First two authors contributed equally. Accepted at L4DC2020. Source code and videos at https://sites.google.com/stanford.edu/smm/

arXiv:2004.06801 [pdf, other]

Scalable Autonomous Vehicle Safety Validation through Dynamic Programming and Scene Decomposition

Authors: Anthony Corso, Ritchie Lee, Mykel J. Kochenderfer

Abstract: An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over f… ▽ More An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over failures of an autonomous policy using approximate dynamic programming. Knowledge of this distribution allows for the efficient discovery of many failure examples. To address the problem of scalability, we decompose complex driving scenarios into subproblems consisting of only the ego vehicle and one other vehicle. These subproblems can be solved with approximate dynamic programming and their solutions are recombined to approximate the solution to the full scenario. We apply our approach to a simple two-vehicle scenario to demonstrate the technique as well as a more complex five-vehicle scenario to demonstrate scalability. In both experiments, we observed an increase in the number of failures discovered compared to baseline approaches. △ Less

Submitted 26 June, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

arXiv:2004.04293 [pdf, other]

The Adaptive Stress Testing Formulation

Authors: Mark Koren, Anthony Corso, Mykel J. Kochenderfer

Abstract: Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide thre… ▽ More Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide three examples of validation problems formulated to work with AST. △ Less

Submitted 8 April, 2020; originally announced April 2020.

Comments: Presented at the Workshop on Robust Autonomy at RSS 2019

arXiv:2004.04292 [pdf, other]

Adaptive Stress Testing without Domain Heuristics using Go-Explore

Authors: Mark Koren, Mykel J. Kochenderfer

Abstract: Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that gui… ▽ More Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that guide it away from failures. For example, some approaches give rewards for taking more-likely actions, because we want to find more-likely failures. However, the agent may then learn to only take likely actions, and may not be able to find a failure at all. Consequently, the problem becomes a hard-exploration problem, where rewards do not aid exploration. A new algorithm, go-explore (GE), has recently set new records on benchmarks from the hard-exploration field. We apply GE to adaptive stress testing (AST), one example of an RL-based falsification approach that provides a way to search for the most-likely failure scenario. We simulate a scenario where an autonomous vehicle drives while a pedestrian is crossing the road. We demonstrate that GE is able to find failures without domain-specific heuristics, such as the distance between the car and the pedestrian, on scenarios that other RL techniques are unable to solve. Furthermore, inspired by the robustification phase of GE, we demonstrate that the backwards algorithm (BA) improves the failures found by other RL techniques. △ Less

Submitted 18 June, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: Accepted to ITSC 2020

arXiv:2003.02381 [pdf, other]

Validation of Image-Based Neural Network Controllers through Adaptive Stress Testing

Authors: Kyle D. Julian, Ritchie Lee, Mykel J. Kochenderfer

Abstract: Neural networks have become state-of-the-art for computer vision problems because of their ability to efficiently model complex functions from large amounts of data. While neural networks can be shown to perform well empirically for a variety of tasks, their performance is difficult to guarantee. Neural network verification tools have been developed that can certify robustness with respect to a gi… ▽ More Neural networks have become state-of-the-art for computer vision problems because of their ability to efficiently model complex functions from large amounts of data. While neural networks can be shown to perform well empirically for a variety of tasks, their performance is difficult to guarantee. Neural network verification tools have been developed that can certify robustness with respect to a given input image; however, for neural network systems used in closed-loop controllers, robustness with respect to individual images does not address multi-step properties of the neural network controller and its environment. Furthermore, neural network systems interacting in the physical world and using natural images are operating in a black-box environment, making formal verification intractable. This work combines the adaptive stress testing (AST) framework with neural network verification tools to search for the most likely sequence of image disturbances that cause the neural network controlled system to reach a failure. An autonomous aircraft taxi application is presented, and results show that the AST method finds failures with more likely image disturbances than baseline methods. Further analysis of AST results revealed an explainable cause of the failure, giving insight into the problematic scenarios that should be addressed. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: 7 pages, 6 figures

arXiv:1912.07084 [pdf, other]

doi 10.1109/DASC43569.2019.9081748

Guaranteeing Safety for Neural Network-Based Aircraft Collision Avoidance Systems

Authors: Kyle D. Julian, Mykel J. Kochenderfer

Abstract: The decision logic for the ACAS X family of aircraft collision avoidance systems is represented as a large numeric table. Due to storage constraints of certified avionics hardware, neural networks have been suggested as a way to significantly compress the data while still preserving performance in terms of safety. However, neural networks are complex continuous functions with outputs that are diff… ▽ More The decision logic for the ACAS X family of aircraft collision avoidance systems is represented as a large numeric table. Due to storage constraints of certified avionics hardware, neural networks have been suggested as a way to significantly compress the data while still preserving performance in terms of safety. However, neural networks are complex continuous functions with outputs that are difficult to predict. Because simulations evaluate only a finite number of encounters, simulations are not sufficient to guarantee that the neural network will perform correctly in all possible situations. We propose a method to provide safety guarantees when using a neural network collision avoidance system. The neural network outputs are bounded using neural network verification tools like Reluplex and Reluval, and a reachability method determines all possible ways aircraft encounters will resolve using neural network advisories and assuming bounded aircraft dynamics. Experiments with systems inspired by ACAS X show that neural networks giving either horizontal or vertical maneuvers can be proven safe. We explore how relaxing the bounds on aircraft dynamics can lead to potentially unsafe encounters and demonstrate how neural network controllers can be modified to guarantee safety through online costs or lowering alerting cost. The reachability method is flexible and can incorporate uncertainties such as pilot delay and sensor error. These results suggest a method for certifying neural network collision avoidance systems for use in real aircraft. △ Less

Submitted 5 May, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

Comments: 10 pages, 11 figures, presented at the 2019 AIAA Digital Avionics Systems Conference (DASC)

Journal ref: IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). 2019

arXiv:1908.01046 [pdf, other]

Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation

Authors: Anthony Corso, Peter Du, Katherine Driggs-Campbell, Mykel J. Kochenderfer

Abstract: Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the mos… ▽ More Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the most likely failure scenarios as a Markov decision process, which can be solved using reinforcement learning. In practice, AST tends to find scenarios where failure is unavoidable and tends to repeatedly discover the same types of failures of a system. This work addresses these issues by encoding domain relevant information into the search procedure. With this modification, the AST method discovers a larger and more expressive subset of the failure space when compared to the original AST formulation. We show that our approach is able to identify useful failure scenarios of an autonomous vehicle policy. △ Less

Submitted 6 August, 2019; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: Appears in IEEE ITSC 2019

arXiv:1903.03948 [pdf, other]

Rethinking System Health Management

Authors: Edward Balaban, Stephen B. Johnson, Mykel J. Kochenderfer

Abstract: Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper… ▽ More Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system's operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts. △ Less

Submitted 10 March, 2019; originally announced March 2019.

Comments: Published in the proceedings of the 2018 AAAI Fall Symposium on Integrating Planning, Diagnosis, and Causal Reasoning

arXiv:1903.00762 [pdf, other]

Verifying Aircraft Collision Avoidance Neural Networks Through Linear Approximations of Safe Regions

Authors: Kyle D. Julian, Shivam Sharma, Jean-Baptiste Jeannin, Mykel J. Kochenderfer

Abstract: The next generation of aircraft collision avoidance systems frame the problem as a Markov decision process and use dynamic programming to optimize the alerting logic. The resulting system uses a large lookup table to determine advisories given to pilots, but these tables can grow very large. To enable the system to operate on limited hardware, prior work investigated compressing the table using a… ▽ More The next generation of aircraft collision avoidance systems frame the problem as a Markov decision process and use dynamic programming to optimize the alerting logic. The resulting system uses a large lookup table to determine advisories given to pilots, but these tables can grow very large. To enable the system to operate on limited hardware, prior work investigated compressing the table using a deep neural network. However, ensuring that the neural network reliably issues safe advisories is important for certification. This work defines linearized regions where each advisory can be safely provided, allowing Reluplex, a neural network verification tool, to check if unsafe advisories are ever issued. A notional collision avoidance policy is generated and used to train a neural network representation. The neural networks are checked for unsafe advisories, resulting in the discovery of thousands of unsafe counterexamples. △ Less

Submitted 2 March, 2019; originally announced March 2019.

arXiv:1903.00520 [pdf, other]

A Reachability Method for Verifying Dynamical Systems with Deep Neural Network Controllers

Authors: Kyle D. Julian, Mykel J. Kochenderfer

Abstract: Deep neural networks can be trained to be efficient and effective controllers for dynamical systems; however, the mechanics of deep neural networks are complex and difficult to guarantee. This work presents a general approach for providing guarantees for deep neural network controllers over multiple time steps using a combination of reachability methods and open source neural network verification… ▽ More Deep neural networks can be trained to be efficient and effective controllers for dynamical systems; however, the mechanics of deep neural networks are complex and difficult to guarantee. This work presents a general approach for providing guarantees for deep neural network controllers over multiple time steps using a combination of reachability methods and open source neural network verification tools. By bounding the system dynamics and neural network outputs, the set of reachable states can be over-approximated to provide a guarantee that the system will never reach states outside the set. The method is demonstrated on the mountain car problem as well as an aircraft collision avoidance problem. Results show that this approach can provide neural network guarantees given a bounded dynamic model. △ Less

Submitted 3 June, 2019; v1 submitted 1 March, 2019; originally announced March 2019.

arXiv:1902.08705 [pdf, ps, other]

A General Framework for Structured Learning of Mechanical Systems

Authors: Jayesh K. Gupta, Kunal Menda, Zachary Manchester, Mykel J. Kochenderfer

Abstract: Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is avail… ▽ More Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency. △ Less

Submitted 1 March, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: 10 pages, 7 figures. First two authors contributed equally. Submitted to IROS/RA-L. Code at https://github.com/sisl/mechamodlearn/

arXiv:1809.10012 [pdf, other]

Using Neural Networks to Generate Information Maps for Mobile Sensors

Authors: Louis Dressel, Mykel J. Kochenderfer

Abstract: Target localization is a critical task for mobile sensors and has many applications. However, generating informative trajectories for these sensors is a challenging research problem. A common method uses information maps that estimate the value of taking measurements from any point in the sensor state space. These information maps are used to generate trajectories; for example, a trajectory might… ▽ More Target localization is a critical task for mobile sensors and has many applications. However, generating informative trajectories for these sensors is a challenging research problem. A common method uses information maps that estimate the value of taking measurements from any point in the sensor state space. These information maps are used to generate trajectories; for example, a trajectory might be designed so its distribution of measurements matches the distribution of the information map. Regardless of the trajectory generation method, generating information maps as new observations are made is critical. However, it can be challenging to compute these maps in real-time. We propose using convolutional neural networks to generate information maps from a target estimate and sensor model in real-time. Simulations show that maps are accurately rendered while offering orders of magnitude reduction in computation time. △ Less

Submitted 26 September, 2018; originally announced September 2018.

Comments: Accepted to the 2018 IEEE Conference on Decision and Control (CDC)

arXiv:1808.06652 [pdf, other]

On the Optimality of Ergodic Trajectories for Information Gathering Tasks

Authors: Louis Dressel, Mykel J. Kochenderfer

Abstract: Recently, ergodic control has been suggested as a means to guide mobile sensors for information gathering tasks. In ergodic control, a mobile sensor follows a trajectory that is ergodic with respect to some information density distribution. A trajectory is ergodic if time spent in a state space region is proportional to the information density of the region. Although ergodic control has shown prom… ▽ More Recently, ergodic control has been suggested as a means to guide mobile sensors for information gathering tasks. In ergodic control, a mobile sensor follows a trajectory that is ergodic with respect to some information density distribution. A trajectory is ergodic if time spent in a state space region is proportional to the information density of the region. Although ergodic control has shown promising experimental results, there is little understanding of why it works or when it is optimal. In this paper, we study a problem class under which optimal information gathering trajectories are ergodic. This class relies on a submodularity assumption for repeated measurements from the same state. It is assumed that information available in a region decays linearly with time spent there. This assumption informs selection of the horizon used in ergodic trajectory generation. We support our claims with a set of experiments that demonstrate the link between ergodicity, optimal information gathering, and submodularity. △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: Presented at 2018 American Control Conference (ACC)

arXiv:1808.00888 [pdf, other]

Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning

Authors: Patrick Slade, Zachary N. Sunberg, Mykel J. Kochenderfer

Abstract: Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploi… ▽ More Real-world autonomous systems operate under uncertainty about both their pose and dynamics. Autonomous control systems must simultaneously perform estimation and control tasks to maintain robustness to changing dynamics or modeling errors. However, information gathering actions often conflict with optimal actions for reaching control objectives, requiring a trade-off between exploration and exploitation. The specific problem setting considered here is for discrete-time nonlinear systems, with process noise, input-constraints, and parameter uncertainty. This article frames this problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search with an unscented Kalman filter to account for process noise and parameter uncertainty. This method is compared with certainty equivalent model predictive control and a tree search method that approximates the QMDP solution, providing insight into when information gathering is useful. Discrete time simulations characterize performance over a range of process noise and bounds on unknown parameters. An offline optimization method is used to select the Monte Carlo tree search parameters without hand-tuning. In lieu of recursive feasibility guarantees, a probabilistic bounding heuristic is offered that increases the probability of keeping the state within a desired region. △ Less

Submitted 31 July, 2018; originally announced August 2018.

Comments: 10 pages, 6 figures. arXiv admin note: text overlap with arXiv:1707.09055

Showing 1–40 of 40 results for author: Kochenderfer, M J