-
DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces
Authors:
Romeo Valentin,
Sydney M. Katz,
Vincent Vanhoucke,
Mykel J. Kochenderfer
Abstract:
Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings, however, requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the spa…
▽ More
Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings, however, requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this structure is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and thousands of dimensions. We demonstrate the efficacy of DB-KSVD by disentangling embeddings of the Gemma-2-2B model and evaluating on six metrics from the SAEBench benchmark, where we achieve competitive results when compared to established approaches based on SAEs. By matching SAE performance with an entirely different optimization approach, our results suggest that (i) SAEs do find strong solutions to the dictionary learning problem and (ii) that traditional optimization approaches can be scaled to the required problem sizes, offering a promising avenue for further research. We provide an implementation of DB-KSVD at https://github.com/RomeoV/KSVD.jl.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Scalable Importance Sampling in High Dimensions with Low-Rank Mixture Proposals
Authors:
Liam A. Kruse,
Marc R. Schlichting,
Mykel J. Kochenderfer
Abstract:
Importance sampling is a Monte Carlo technique for efficiently estimating the likelihood of rare events by biasing the sampling distribution towards the rare event of interest. By drawing weighted samples from a learned proposal distribution, importance sampling allows for more sample-efficient estimation of rare events or tails of distributions. A common choice of proposal density is a Gaussian m…
▽ More
Importance sampling is a Monte Carlo technique for efficiently estimating the likelihood of rare events by biasing the sampling distribution towards the rare event of interest. By drawing weighted samples from a learned proposal distribution, importance sampling allows for more sample-efficient estimation of rare events or tails of distributions. A common choice of proposal density is a Gaussian mixture model (GMM). However, estimating full-rank GMM covariance matrices in high dimensions is a challenging task due to numerical instabilities. In this work, we propose using mixtures of probabilistic principal component analyzers (MPPCA) as the parametric proposal density for importance sampling methods. MPPCA models are a type of low-rank mixture model that can be fit quickly using expectation-maximization, even in high-dimensional spaces. We validate our method on three simulated systems, demonstrating consistent gains in sample efficiency and quality of failure distribution characterization.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Failure Probability Estimation for Black-Box Autonomous Systems using State-Dependent Importance Sampling Proposals
Authors:
Harrison Delecki,
Sydney M. Katz,
Mykel J. Kochenderfer
Abstract:
Estimating the probability of failure is a critical step in developing safety-critical autonomous systems. Direct estimation methods such as Monte Carlo sampling are often impractical due to the rarity of failures in these systems. Existing importance sampling approaches do not scale to sequential decision-making systems with large state spaces and long horizons. We propose an adaptive importance…
▽ More
Estimating the probability of failure is a critical step in developing safety-critical autonomous systems. Direct estimation methods such as Monte Carlo sampling are often impractical due to the rarity of failures in these systems. Existing importance sampling approaches do not scale to sequential decision-making systems with large state spaces and long horizons. We propose an adaptive importance sampling algorithm to address these limitations. Our method minimizes the forward Kullback-Leibler divergence between a state-dependent proposal distribution and a relaxed form of the optimal importance sampling distribution. Our method uses Markov score ascent methods to estimate this objective. We evaluate our approach on four sequential systems and show that it provides more accurate failure probability estimates than baseline Monte Carlo and importance sampling techniques. This work is open sourced.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers
Authors:
Alexandros E. Tzikas,
Licio Romao,
Mert Pilanci,
Alessandro Abate,
Mykel J. Kochenderfer
Abstract:
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literatur…
▽ More
Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Model-based Validation as Probabilistic Inference
Authors:
Harrison Delecki,
Anthony Corso,
Mykel J. Kochenderfer
Abstract:
Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents…
▽ More
Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents the distribution over failure trajectories using rollouts of system dynamics and computes trajectory gradients using automatic differentiation. Our approach is demonstrated in an inverted pendulum control system, an autonomous vehicle driving scenario, and a partially observable lunar lander. Sampling is performed using an off-the-shelf implementation of Hamiltonian Monte Carlo with multiple chains to capture multimodality and gradient smoothing for safe trajectories. In all experiments, we observed improvements in sample efficiency and parameter space coverage compared to black-box baseline approaches. This work is open sourced.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Bayesian Safety Validation for Failure Probability Estimation of Black-Box Systems
Authors:
Robert J. Moss,
Mykel J. Kochenderfer,
Maxime Gariel,
Arthur Dubois
Abstract:
Estimating the probability of failure is an important step in the certification of safety-critical systems. Efficient estimation methods are often needed due to the challenges posed by high-dimensional input spaces, risky test scenarios, and computationally expensive simulators. This work frames the problem of black-box safety validation as a Bayesian optimization problem and introduces a method t…
▽ More
Estimating the probability of failure is an important step in the certification of safety-critical systems. Efficient estimation methods are often needed due to the challenges posed by high-dimensional input spaces, risky test scenarios, and computationally expensive simulators. This work frames the problem of black-box safety validation as a Bayesian optimization problem and introduces a method that iteratively fits a probabilistic surrogate model to efficiently predict failures. The algorithm is designed to search for failures, compute the most-likely failure, and estimate the failure probability over an operating domain using importance sampling. We introduce three acquisition functions that aim to reduce uncertainty by covering the design space, optimize the analytically derived failure boundaries, and sample the predicted failure regions. Results show this Bayesian safety validation approach provides a more accurate estimate of failure probability with orders of magnitude fewer samples and performs well across various safety validation metrics. We demonstrate this approach on three test problems, a stochastic decision making system, and a neural network-based runway detection system. This work is open sourced (https://github.com/sisl/BayesianSafetyValidation.jl) and currently being used to supplement the FAA certification process of the machine learning components for an autonomous cargo aircraft.
△ Less
Submitted 29 June, 2024; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Optimality Guarantees for Particle Belief Approximation of POMDPs
Authors:
Michael H. Lim,
Tyler J. Becker,
Mykel J. Kochenderfer,
Claire J. Tomlin,
Zachary N. Sunberg
Abstract:
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w…
▽ More
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
△ Less
Submitted 19 October, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
Authors:
Phil Chen,
Masha Itkina,
Ransalu Senanayake,
Mykel J. Kochenderfer
Abstract:
Many applications of generative models rely on the marginalization of their high-dimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more computationally tractable. However, sparse normalization functions usually require alternative loss functions for training since the log-likelihood is undefined for spar…
▽ More
Many applications of generative models rely on the marginalization of their high-dimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more computationally tractable. However, sparse normalization functions usually require alternative loss functions for training since the log-likelihood is undefined for sparse probability distributions. Furthermore, many sparse normalization functions often collapse the multimodality of distributions. In this work, we present $\textit{ev-softmax}$, a sparse normalization function that preserves the multimodality of probability distributions. We derive its properties, including its gradient in closed-form, and introduce a continuous family of approximations to $\textit{ev-softmax}$ that have full support and can be trained with probabilistic loss functions such as negative log-likelihood and Kullback-Leibler divergence. We evaluate our method on a variety of generative models, including variational autoencoders and auto-regressive architectures. Our method outperforms existing dense and sparse normalization techniques in distributional accuracy. We demonstrate that $\textit{ev-softmax}$ successfully reduces the dimensionality of probability distributions while maintaining multimodality.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Finding Failures in High-Fidelity Simulation using Adaptive Stress Testing and the Backward Algorithm
Authors:
Mark Koren,
Ahmed Nassar,
Mykel J. Kochenderfer
Abstract:
Validating the safety of autonomous systems generally requires the use of high-fidelity simulators that adequately capture the variability of real-world scenarios. However, it is generally not feasible to exhaustively search the space of simulation scenarios for failures. Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system. AST wit…
▽ More
Validating the safety of autonomous systems generally requires the use of high-fidelity simulators that adequately capture the variability of real-world scenarios. However, it is generally not feasible to exhaustively search the space of simulation scenarios for failures. Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system. AST with a deep reinforcement learning solver has been shown to be effective in finding failures across a range of different systems. This approach generally involves running many simulations, which can be very expensive when using a high-fidelity simulator. To improve efficiency, we present a method that first finds failures in a low-fidelity simulator. It then uses the backward algorithm, which trains a deep neural network policy using a single expert demonstration, to adapt the low-fidelity failures to high-fidelity. We have created a series of autonomous vehicle validation case studies that represent some of the ways low-fidelity and high-fidelity simulators can differ, such as time discretization. We demonstrate in a variety of case studies that this new AST approach is able to find failures with significantly fewer high-fidelity simulation steps than are needed when just running AST directly in high-fidelity. As a proof of concept, we also demonstrate AST on NVIDIA's DriveSim simulator, an industry state-of-the-art high-fidelity simulator for finding failures in autonomous vehicles.
△ Less
Submitted 27 July, 2021;
originally announced July 2021.
-
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
Authors:
Andrea Zanette,
Alessandro Lazaric,
Mykel J. Kochenderfer,
Emma Brunskill
Abstract:
There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more sta…
▽ More
There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more standard notion of low inherent Bellman error, typically employed in least-square value iteration-style algorithms, we can provide strong PAC guarantees on learning a near optimal value function provided that the linear space is sufficiently "explorable". We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function, which is revealed only once learning has completed. If this reward function is also estimated from the samples gathered during pure exploration, our results also provide same-order PAC guarantees on the performance of the resulting policy for this setting.
△ Less
Submitted 21 October, 2020; v1 submitted 18 August, 2020;
originally announced August 2020.
-
Directional Primitives for Uncertainty-Aware Motion Estimation in Urban Environments
Authors:
Ransalu Senanayake,
Maneekwan Toyungyernsub,
Mingyu Wang,
Mykel J. Kochenderfer,
Mac Schwager
Abstract:
We can use driving data collected over a long period of time to extract rich information about how vehicles behave in different areas of the roads. In this paper, we introduce the concept of directional primitives, which is a representation of prior information of road networks. Specifically, we represent the uncertainty of directions using a mixture of von Mises distributions and associated speed…
▽ More
We can use driving data collected over a long period of time to extract rich information about how vehicles behave in different areas of the roads. In this paper, we introduce the concept of directional primitives, which is a representation of prior information of road networks. Specifically, we represent the uncertainty of directions using a mixture of von Mises distributions and associated speeds using gamma distributions. These location-dependent primitives can be combined with motion information of surrounding vehicles to predict their future behavior in the form of probability distributions. Experiments conducted on highways, intersections, and roundabouts in the Carla simulator, as well as real-world urban driving datasets, indicate that primitives lead to better uncertainty-aware motion estimation.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Scalable Identification of Partially Observed Systems with Certainty-Equivalent EM
Authors:
Kunal Menda,
Jean de Becdelièvre,
Jayesh K. Gupta,
Ilan Kroo,
Mykel J. Kochenderfer,
Zachary Manchester
Abstract:
System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We f…
▽ More
System identification is a key step for model-based control, estimator design, and output prediction. This work considers the offline identification of partially observed nonlinear systems. We empirically show that the certainty-equivalent approximation to expectation-maximization can be a reliable and scalable approach for high-dimensional deterministic systems, which are common in robotics. We formulate certainty-equivalent expectation-maximization as block coordinate-ascent, and provide an efficient implementation. The algorithm is tested on a simulated system of coupled Lorenz attractors, demonstrating its ability to identify high-dimensional systems that can be intractable for particle-based approaches. Our approach is also used to identify the dynamics of an aerobatic helicopter. By augmenting the state with unobserved fluid states, a model is learned that predicts the acceleration of the helicopter better than state-of-the-art approaches. The codebase for this work is available at https://github.com/sisl/CEEM.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
Towards Recurrent Autoregressive Flow Models
Authors:
John Mern,
Peter Morales,
Mykel J. Kochenderfer
Abstract:
Stochastic processes generated by non-stationary distributions are difficult to represent with conventional models such as Gaussian processes. This work presents Recurrent Autoregressive Flows as a method toward general stochastic process modeling with normalizing flows. The proposed method defines a conditional distribution for each variable in a sequential process by conditioning the parameters…
▽ More
Stochastic processes generated by non-stationary distributions are difficult to represent with conventional models such as Gaussian processes. This work presents Recurrent Autoregressive Flows as a method toward general stochastic process modeling with normalizing flows. The proposed method defines a conditional distribution for each variable in a sequential process by conditioning the parameters of a normalizing flow with recurrent neural connections. Complex conditional relationships are learned through the recurrent network parameters. In this work, we present an initial design for a recurrent flow cell and a method to train the model to match observed empirical distributions. We demonstrate the effectiveness of this class of models through a series of experiments in which models are trained on three complex stochastic processes. We highlight the shortcomings of our current formulation and suggest some potential solutions.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
A Survey of Algorithms for Black-Box Safety Validation of Cyber-Physical Systems
Authors:
Anthony Corso,
Robert J. Moss,
Mark Koren,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a blac…
▽ More
Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box operating in a simulated environment. Safety validation tasks include finding disturbances in the environment that cause the system to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques for CPS with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, which are common for CPS. A brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools.
△ Less
Submitted 14 October, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Interpretable Safety Validation for Autonomous Vehicles
Authors:
Anthony Corso,
Mykel J. Kochenderfer
Abstract:
An open problem for autonomous driving is how to validate the safety of an autonomous vehicle in simulation. Automated testing procedures can find failures of an autonomous system but these failures may be difficult to interpret due to their high dimensionality and may be so unlikely as to not be important. This work describes an approach for finding interpretable failures of an autonomous system.…
▽ More
An open problem for autonomous driving is how to validate the safety of an autonomous vehicle in simulation. Automated testing procedures can find failures of an autonomous system but these failures may be difficult to interpret due to their high dimensionality and may be so unlikely as to not be important. This work describes an approach for finding interpretable failures of an autonomous system. The failures are described by signal temporal logic expressions that can be understood by a human, and are optimized to produce failures that have high likelihood. Our methodology is demonstrated for the safety validation of an autonomous vehicle in the context of an unprotected left turn and a crosswalk with a pedestrian. Compared to a baseline importance sampling approach, our methodology finds more failures with higher likelihood while retaining interpretability.
△ Less
Submitted 26 June, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Scalable Autonomous Vehicle Safety Validation through Dynamic Programming and Scene Decomposition
Authors:
Anthony Corso,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over f…
▽ More
An open question in autonomous driving is how best to use simulation to validate the safety of autonomous vehicles. Existing techniques rely on simulated rollouts, which can be inefficient for finding rare failure events, while other techniques are designed to only discover a single failure. In this work, we present a new safety validation approach that attempts to estimate the distribution over failures of an autonomous policy using approximate dynamic programming. Knowledge of this distribution allows for the efficient discovery of many failure examples. To address the problem of scalability, we decompose complex driving scenarios into subproblems consisting of only the ego vehicle and one other vehicle. These subproblems can be solved with approximate dynamic programming and their solutions are recombined to approximate the solution to the full scenario. We apply our approach to a simple two-vehicle scenario to demonstrate the technique as well as a more complex five-vehicle scenario to demonstrate scalability. In both experiments, we observed an increase in the number of failures discovered compared to baseline approaches.
△ Less
Submitted 26 June, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
The Adaptive Stress Testing Formulation
Authors:
Mark Koren,
Anthony Corso,
Mykel J. Kochenderfer
Abstract:
Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide thre…
▽ More
Validation is a key challenge in the search for safe autonomy. Simulations are often either too simple to provide robust validation, or too complex to tractably compute. Therefore, approximate validation methods are needed to tractably find failures without unsafe simplifications. This paper presents the theory behind one such black-box approach: adaptive stress testing (AST). We also provide three examples of validation problems formulated to work with AST.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Adaptive Stress Testing without Domain Heuristics using Go-Explore
Authors:
Mark Koren,
Mykel J. Kochenderfer
Abstract:
Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that gui…
▽ More
Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domain-specific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that guide it away from failures. For example, some approaches give rewards for taking more-likely actions, because we want to find more-likely failures. However, the agent may then learn to only take likely actions, and may not be able to find a failure at all. Consequently, the problem becomes a hard-exploration problem, where rewards do not aid exploration. A new algorithm, go-explore (GE), has recently set new records on benchmarks from the hard-exploration field. We apply GE to adaptive stress testing (AST), one example of an RL-based falsification approach that provides a way to search for the most-likely failure scenario. We simulate a scenario where an autonomous vehicle drives while a pedestrian is crossing the road. We demonstrate that GE is able to find failures without domain-specific heuristics, such as the distance between the car and the pedestrian, on scenarios that other RL techniques are unable to solve. Furthermore, inspired by the robustification phase of GE, we demonstrate that the backwards algorithm (BA) improves the failures found by other RL techniques.
△ Less
Submitted 18 June, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Exchangeable Input Representations for Reinforcement Learning
Authors:
John Mern,
Dorsa Sadigh,
Mykel J. Kochenderfer
Abstract:
Poor sample efficiency is a major limitation of deep reinforcement learning in many domains. This work presents an attention-based method to project neural network inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects. We also show…
▽ More
Poor sample efficiency is a major limitation of deep reinforcement learning in many domains. This work presents an attention-based method to project neural network inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects. We also show that our method is able to represent inputs over variable numbers of objects. Our experiments demonstrate improvements in sample efficiency for policy gradient methods on a variety of tasks. We show that our representation allows us to solve problems that are otherwise intractable when using naïve approaches.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Monte-Carlo Tree Search for Policy Optimization
Authors:
Xiaobai Ma,
Katherine Driggs-Campbell,
Zongzhang Zhang,
Mykel J. Kochenderfer
Abstract:
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization…
▽ More
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Parameter-Conditioned Sequential Generative Modeling of Fluid Flows
Authors:
Jeremy Morton,
Freddie D. Witherden,
Mykel J. Kochenderfer
Abstract:
The computational cost associated with simulating fluid flows can make it infeasible to run many simulations across multiple flow conditions. Building upon concepts from generative modeling, we introduce a new method for learning neural network models capable of performing efficient parameterized simulations of fluid flows. Evaluated on their ability to simulate both two-dimensional and three-dime…
▽ More
The computational cost associated with simulating fluid flows can make it infeasible to run many simulations across multiple flow conditions. Building upon concepts from generative modeling, we introduce a new method for learning neural network models capable of performing efficient parameterized simulations of fluid flows. Evaluated on their ability to simulate both two-dimensional and three-dimensional fluid flows, trained models are shown to capture local and global properties of the flow fields at a wide array of flow conditions. Furthermore, flow simulations generated by the trained models are shown to be orders of magnitude faster than the corresponding computational fluid dynamics simulations.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation
Authors:
Anthony Corso,
Peter Du,
Katherine Driggs-Campbell,
Mykel J. Kochenderfer
Abstract:
Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the mos…
▽ More
Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the most likely failure scenarios as a Markov decision process, which can be solved using reinforcement learning. In practice, AST tends to find scenarios where failure is unavoidable and tends to repeatedly discover the same types of failures of a system. This work addresses these issues by encoding domain relevant information into the search procedure. With this modification, the AST method discovers a larger and more expressive subset of the failure space when compared to the original AST formulation. We show that our approach is able to identify useful failure scenarios of an autonomous vehicle policy.
△ Less
Submitted 6 August, 2019; v1 submitted 2 August, 2019;
originally announced August 2019.
-
Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning
Authors:
Ross E. Allen,
Jayesh K. Gupta,
Jaime Pena,
Yutai Zhou,
Javona White Bear,
Mykel J. Kochenderfer
Abstract:
This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then extended to a multi-agent variant of the proximal policy optimization algori…
▽ More
This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then extended to a multi-agent variant of the proximal policy optimization algorithm and demonstrated on particle and multiwalker robot environments that have characteristics such as system health, risk-taking, semi-expendable agents, continuous action spaces, and partial observability. We show significant improvement in learning performance compared to policy gradient methods that do not perform multi-agent credit assignment.
△ Less
Submitted 4 January, 2021; v1 submitted 2 August, 2019;
originally announced August 2019.
-
Algorithms for Verifying Deep Neural Networks
Authors:
Changliu Liu,
Tomer Arnon,
Christopher Lazarus,
Christopher Strong,
Clark Barrett,
Mykel J. Kochenderfer
Abstract:
Deep neural networks are widely used for nonlinear function approximation with applications ranging from computer vision to control. Although these networks involve the composition of simple arithmetic operations, it can be very challenging to verify whether a particular network satisfies certain input-output properties. This article surveys methods that have emerged recently for soundly verifying…
▽ More
Deep neural networks are widely used for nonlinear function approximation with applications ranging from computer vision to control. Although these networks involve the composition of simple arithmetic operations, it can be very challenging to verify whether a particular network satisfies certain input-output properties. This article surveys methods that have emerged recently for soundly verifying such properties. These methods borrow insights from reachability analysis, optimization, and search. We discuss fundamental differences and connections between existing algorithms. In addition, we provide pedagogical implementations of existing methods and compare them on a set of benchmark problems.
△ Less
Submitted 15 October, 2020; v1 submitted 15 March, 2019;
originally announced March 2019.
-
Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning
Authors:
Xiaobai Ma,
Katherine Driggs-Campbell,
Mykel J. Kochenderfer
Abstract:
To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the gam…
▽ More
To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the game, Robust Adversarial Reinforcement Learning and Neural Fictitious Self Play, and compares performance on an autonomous driving scenario. We extend the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance. The resulting robust policy exhibits improved driving efficiency while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Deep Variational Koopman Models: Inferring Koopman Observations for Uncertainty-Aware Dynamics Modeling and Control
Authors:
Jeremy Morton,
Freddie D Witherden,
Mykel J Kochenderfer
Abstract:
Koopman theory asserts that a nonlinear dynamical system can be mapped to a linear system, where the Koopman operator advances observations of the state forward in time. However, the observable functions that map states to observations are generally unknown. We introduce the Deep Variational Koopman (DVK) model, a method for inferring distributions over observations that can be propagated linearly…
▽ More
Koopman theory asserts that a nonlinear dynamical system can be mapped to a linear system, where the Koopman operator advances observations of the state forward in time. However, the observable functions that map states to observations are generally unknown. We introduce the Deep Variational Koopman (DVK) model, a method for inferring distributions over observations that can be propagated linearly in time. By sampling from the inferred distributions, we obtain a distribution over dynamical models, which in turn provides a distribution over possible outcomes as a modeled system advances in time. Experiments show that the DVK model is effective at long-term prediction for a variety of dynamical systems. Furthermore, we describe how to incorporate the learned models into a control framework, and demonstrate that accounting for the uncertainty present in the distribution over dynamical models enables more effective control.
△ Less
Submitted 1 June, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Adaptive Stress Testing for Autonomous Vehicles
Authors:
Mark Koren,
Saud Alsaif,
Ritchie Lee,
Mykel J. Kochenderfer
Abstract:
This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle's environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to fin…
▽ More
This paper presents a method for testing the decision making systems of autonomous vehicles. Our approach involves perturbing stochastic elements in the vehicle's environment until the vehicle is involved in a collision. Instead of applying direct Monte Carlo sampling to find collision scenarios, we formulate the problem as a Markov decision process and use reinforcement learning algorithms to find the most likely failure scenarios. This paper presents Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (DRL) solutions that can scale to large environments. We show that DRL can find more likely failure scenarios than MCTS with fewer calls to the simulator. A simulation scenario involving a vehicle approaching a crosswalk is used to validate the framework. Our proposed approach is very general and can be easily applied to other scenarios given the appropriate models of the vehicle and the environment.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Robust Super-Level Set Estimation using Gaussian Processes
Authors:
Andrea Zanette,
Junzi Zhang,
Mykel J. Kochenderfer
Abstract:
This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicte…
▽ More
This paper focuses on the problem of determining as large a region as possible where a function exceeds a given threshold with high probability. We assume that we only have access to a noise-corrupted version of the function and that function evaluations are costly. To select the next query point, we propose maximizing the expected volume of the domain identified as above the threshold as predicted by a Gaussian process, robustified by a variance term. We also give asymptotic guarantees on the exploration effect of the algorithm, regardless of the prior misspecification. We show by various numerical examples that our approach also outperforms existing techniques in the literature in practice.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
Deep Neural Network Compression for Aircraft Collision Avoidance Systems
Authors:
Kyle D. Julian,
Mykel J. Kochenderfer,
Michael P. Owen
Abstract:
One approach to designing decision making logic for an aircraft collision avoidance system frames the problem as a Markov decision process and optimizes the system using dynamic programming. The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avo…
▽ More
One approach to designing decision making logic for an aircraft collision avoidance system frames the problem as a Markov decision process and optimizes the system using dynamic programming. The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avoidance systems for manned and unmanned aircraft, but the high dimensionality of the state space leads to very large tables. To improve storage efficiency, a deep neural network is used to approximate the table. With the use of an asymmetric loss function and a gradient descent algorithm, the parameters for this network can be trained to provide accurate estimates of table values while preserving the relative preferences of the possible advisories for each state. By training multiple networks to represent subtables, the network also decreases the required runtime for computing the collision avoidance advisory. Simulation studies show that the network improves the safety and efficiency of the collision avoidance system. Because only the network parameters need to be stored, the required storage space is reduced by a factor of 1000, enabling the collision avoidance system to operate using current avionics systems.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
Amortized Inference Regularization
Authors:
Rui Shu,
Hung H. Bui,
Shengjia Zhao,
Mykel J. Kochenderfer,
Stefano Ermon
Abstract:
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior app…
▽ More
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.
△ Less
Submitted 9 January, 2019; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Burn-In Demonstrations for Multi-Modal Imitation Learning
Authors:
Alex Kuefler,
Mykel J. Kochenderfer
Abstract:
Recent work on imitation learning has generated policies that reproduce expert behavior from multi-modal data. However, past approaches have focused only on recreating a small number of distinct, expert maneuvers, or have relied on supervised learning techniques that produce unstable policies. This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over a…
▽ More
Recent work on imitation learning has generated policies that reproduce expert behavior from multi-modal data. However, past approaches have focused only on recreating a small number of distinct, expert maneuvers, or have relied on supervised learning techniques that produce unstable policies. This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time. Our approach involves reformulating the typical imitation learning setting to include "burn-in demonstrations" upon which policies are conditioned at test time. We demonstrate that our approach outperforms standard InfoGAIL in maximizing the mutual information between predicted and unseen style labels in road scene simulations, and we show that our method leads to policies that imitate expert autonomous driving systems over long time horizons.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Towards Proving the Adversarial Robustness of Deep Neural Networks
Authors:
Guy Katz,
Clark Barrett,
David L. Dill,
Kyle Julian,
Mykel J. Kochenderfer
Abstract:
Autonomous vehicles are highly complex systems, required to function reliably in a wide variety of situations. Manually crafting software controllers for these vehicles is difficult, but there has been some success in using deep neural networks generated using machine-learning. However, deep neural networks are opaque to human engineers, rendering their correctness very difficult to prove manually…
▽ More
Autonomous vehicles are highly complex systems, required to function reliably in a wide variety of situations. Manually crafting software controllers for these vehicles is difficult, but there has been some success in using deep neural networks generated using machine-learning. However, deep neural networks are opaque to human engineers, rendering their correctness very difficult to prove manually; and existing automated techniques, which were not designed to operate on neural networks, fail to scale to large systems. This paper focuses on proving the adversarial robustness of deep neural networks, i.e. proving that small perturbations to a correctly-classified input to the network cannot cause it to be misclassified. We describe some of our recent and ongoing work on verifying the adversarial robustness of networks, and discuss some of the open questions we have encountered and how they might be addressed.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior
Authors:
Jeremy Morton,
Mykel J. Kochenderfer
Abstract:
In this work, we propose a method for learning driver models that account for variables that cannot be observed directly. When trained on a synthetic dataset, our models are able to learn encodings for vehicle trajectories that distinguish between four distinct classes of driver behavior. Such encodings are learned without any knowledge of the number of driver classes or any objective that directl…
▽ More
In this work, we propose a method for learning driver models that account for variables that cannot be observed directly. When trained on a synthetic dataset, our models are able to learn encodings for vehicle trajectories that distinguish between four distinct classes of driver behavior. Such encodings are learned without any knowledge of the number of driver classes or any objective that directly requires the models to learn encodings for each class. We show that driving policies trained with knowledge of latent variables are more effective than baseline methods at imitating the driver behavior that they are trained to replicate. Furthermore, we demonstrate that the actions chosen by our policy are heavily influenced by the latent variable settings that are provided to them.
△ Less
Submitted 18 April, 2017;
originally announced April 2017.