Search | arXiv e-print repository

Disturbance Observers for Robust Backup Control Barrier Functions

Authors: David E. J. van Wijk, Ersin Das, Anil Alan, Samuel Coogan, Tamas G. Molnar, Joel W. Burdick, Manoranjan Majji, Kerianne L. Hobbs

Abstract: Designing safe controllers is crucial and notoriously challenging for input-constrained safety-critical control systems. Backup control barrier functions offer an approach for the construction of safe controllers online by considering the flow of the system under a backup controller. However, in the presence of model uncertainties, the flow cannot be accurately computed, making this method insuffi… ▽ More Designing safe controllers is crucial and notoriously challenging for input-constrained safety-critical control systems. Backup control barrier functions offer an approach for the construction of safe controllers online by considering the flow of the system under a backup controller. However, in the presence of model uncertainties, the flow cannot be accurately computed, making this method insufficient for safety assurance. To tackle this shortcoming, we integrate backup control barrier functions with a disturbance observer and estimate the flow under a reconstruction of the disturbance while refining this estimate over time. We prove that the controllers resulting from the proposed Disturbance Observer Backup Control Barrier Function (DO-bCBF) approach guarantee safety, are robust to unknown disturbances, and satisfy input constraints. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: Submitted to IEEE Control Systems Letters (L-CSS). 6 pages, 4 figures

arXiv:2502.13406 [pdf, other]

Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks

Authors: Vince Kurtz, Joel W. Burdick

Abstract: Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But existing methods come with two key limitations: they require expert demonstrations, which can be difficult to obtain, and they are limited to relatively slow, quasi-static tasks. In this paper, we le… ▽ More Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But existing methods come with two key limitations: they require expert demonstrations, which can be difficult to obtain, and they are limited to relatively slow, quasi-static tasks. In this paper, we leverage a tight connection between sampling-based predictive control and generative modeling to address each of these issues. In particular, we introduce generative predictive control, a supervised learning framework for tasks with fast dynamics that are easy to simulate but difficult to demonstrate. We then show how trained flow-matching policies can be warm-started at inference time, maintaining temporal consistency and enabling high-frequency feedback. We believe that generative predictive control offers a complementary approach to existing behavior cloning methods, and hope that it paves the way toward generalist policies that extend beyond quasi-static demonstration-oriented tasks. △ Less

Submitted 1 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

arXiv:2411.17277 [pdf, other]

Minimizing Conservatism in Safety-Critical Control for Input-Delayed Systems via Adaptive Delay Estimation

Authors: Yitaek Kim, Ersin Das, Jeeseop Kim, Aaron D. Ames, Joel W. Burdick, Christoffer Sloth

Abstract: Input delays affect systems such as teleoperation and wirelessly autonomous connected vehicles, and may lead to safety violations. One promising way to ensure safety in the presence of delay is to employ control barrier functions (CBFs), and extensions thereof that account for uncertainty: delay adaptive CBFs (DaCBFs). This paper proposes an online adaptive safety control framework for reducing th… ▽ More Input delays affect systems such as teleoperation and wirelessly autonomous connected vehicles, and may lead to safety violations. One promising way to ensure safety in the presence of delay is to employ control barrier functions (CBFs), and extensions thereof that account for uncertainty: delay adaptive CBFs (DaCBFs). This paper proposes an online adaptive safety control framework for reducing the conservatism of DaCBFs. The main idea is to reduce the maximum delay estimation error bound so that the state prediction error bound is monotonically non-increasing. To this end, we first leverage the estimation error bound of a disturbance observer to bound the state prediction error. Second, we design two nonlinear programs to update the maximum delay estimation error bound satisfying the prediction error bound, and subsequently update the maximum state prediction error bound used in DaCBFs. The proposed method ensures the maximum state prediction error bound is monotonically non-increasing, yielding less conservatism in DaCBFs. We verify the proposed method in an automated connected truck application, showing that the proposed method reduces the conservatism of DaCBFs. △ Less

Submitted 26 November, 2024; originally announced November 2024.

Comments: This paper has been submitted to ECC 2025 for possible publication

arXiv:2411.17079 [pdf, other]

Zero-Order Control Barrier Functions for Sampled-Data Systems with State and Input Dependent Safety Constraints

Authors: Xiao Tan, Ersin Das, Aaron D. Ames, Joel W. Burdick

Abstract: We propose a novel zero-order control barrier function (ZOCBF) for sampled-data systems to ensure system safety. Our formulation generalizes conventional control barrier functions and straightforwardly handles safety constraints with high-relative degrees or those that explicitly depend on both system states and inputs. The proposed ZOCBF condition does not require any differentiation operation. I… ▽ More We propose a novel zero-order control barrier function (ZOCBF) for sampled-data systems to ensure system safety. Our formulation generalizes conventional control barrier functions and straightforwardly handles safety constraints with high-relative degrees or those that explicitly depend on both system states and inputs. The proposed ZOCBF condition does not require any differentiation operation. Instead, it involves computing the difference of the ZOCBF values at two consecutive sampling instants. We propose three numerical approaches to enforce the ZOCBF condition, tailored to different problem settings and available computational resources. We demonstrate the effectiveness of our approach through a collision avoidance example and a rollover prevention example on uneven terrains. △ Less

Submitted 8 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

Comments: To present at ACC 2025

arXiv:2410.01939 [pdf, other]

Equality Constrained Diffusion for Direct Trajectory Optimization

Authors: Vince Kurtz, Joel W. Burdick

Abstract: The recent success of diffusion-based generative models in image and natural language processing has ignited interest in diffusion-based trajectory optimization for nonlinear control systems. Existing methods cannot, however, handle the nonlinear equality constraints necessary for direct trajectory optimization. As a result, diffusion-based trajectory optimizers are currently limited to shooting m… ▽ More The recent success of diffusion-based generative models in image and natural language processing has ignited interest in diffusion-based trajectory optimization for nonlinear control systems. Existing methods cannot, however, handle the nonlinear equality constraints necessary for direct trajectory optimization. As a result, diffusion-based trajectory optimizers are currently limited to shooting methods, where the nonlinear dynamics are enforced by forward rollouts. This precludes many of the benefits enjoyed by direct methods, including flexible state constraints, reduced numerical sensitivity, and easy initial guess specification. In this paper, we present a method for diffusion-based optimization with equality constraints. This allows us to perform direct trajectory optimization, enforcing dynamic feasibility with constraints rather than rollouts. To the best of our knowledge, this is the first diffusion-based optimization algorithm that supports the general nonlinear equality constraints required for direct trajectory optimization. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.10802 [pdf, other]

Bayesian Optimal Experimental Design for Robot Kinematic Calibration

Authors: Ersin Das, Thomas Touma, Joel W. Burdick

Abstract: This paper develops a Bayesian optimal experimental design for robot kinematic calibration on ${\mathbb{S}^3 \!\times\! \mathbb{R}^3}$. Our method builds upon a Gaussian process approach that incorporates a geometry-aware kernel based on Riemannian Matérn kernels over ${\mathbb{S}^3}$. To learn the forward kinematics errors via Bayesian optimization with a Gaussian process, we define a geodesic di… ▽ More This paper develops a Bayesian optimal experimental design for robot kinematic calibration on ${\mathbb{S}^3 \!\times\! \mathbb{R}^3}$. Our method builds upon a Gaussian process approach that incorporates a geometry-aware kernel based on Riemannian Matérn kernels over ${\mathbb{S}^3}$. To learn the forward kinematics errors via Bayesian optimization with a Gaussian process, we define a geodesic distance-based objective function. Pointwise values of this function are sampled via noisy measurements taken using fiducial markers on the end-effector using a camera and computed pose with the nominal kinematics. The corrected Denavit-Hartenberg parameters are obtained using an efficient quadratic program that operates on the collected data sets. The effectiveness of the proposed method is demonstrated via simulations and calibration experiments on NASA's ocean world lander autonomy testbed (OWLAT). △ Less

Submitted 4 March, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

Comments: ICRA 2025

arXiv:2409.05792 [pdf, other]

Supervised Learning for Stochastic Optimal Control

Authors: Vince Kurtz, Joel W. Burdick

Abstract: Supervised machine learning is powerful. In recent years, it has enabled massive breakthroughs in computer vision and natural language processing. But leveraging these advances for optimal control has proved difficult. Data is a key limiting factor. Without access to the optimal policy, value function, or demonstrations, how can we fit a policy? In this paper, we show how to automatically generate… ▽ More Supervised machine learning is powerful. In recent years, it has enabled massive breakthroughs in computer vision and natural language processing. But leveraging these advances for optimal control has proved difficult. Data is a key limiting factor. Without access to the optimal policy, value function, or demonstrations, how can we fit a policy? In this paper, we show how to automatically generate supervised learning data for a class of continuous-time nonlinear stochastic optimal control problems. In particular, applying the Feynman-Kac theorem to a linear reparameterization of the Hamilton-Jacobi-Bellman PDE allows us to sample the value function by simulating a stochastic process. Hardware accelerators like GPUs could rapidly generate a large amount of this training data. With this data in hand, stochastic optimal control becomes supervised learning. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: CDC 2024

arXiv:2405.15100 [pdf, other]

Mobile Robot Sensory Coverage in 2-D Environments: An Optimization Approach with Efficiency Bounds

Authors: E. Fourney, J. W. Burdick, E. D. Rimon

Abstract: This paper considers three related mobile robot multi-target sensory coverage and inspection planning problems in 2-D environments. In the first problem, a mobile robot must find the shortest path to observe multiple targets with a limited range sensor in an obstacle free environment. In the second problem, the mobile robot must efficiently observe multiple targets while taking advantage of multi-… ▽ More This paper considers three related mobile robot multi-target sensory coverage and inspection planning problems in 2-D environments. In the first problem, a mobile robot must find the shortest path to observe multiple targets with a limited range sensor in an obstacle free environment. In the second problem, the mobile robot must efficiently observe multiple targets while taking advantage of multi-target views in an obstacle free environment. The third problem considers multi-target sensory coverage in the presence of obstacles that obstruct sensor views of the targets. We show how all three problems can be formulated in a MINLP optimization framework. Because exact solutions to these problems are NP-hard, we introduce polynomial time approximation algorithms for each problem. These algorithms combine polynomial-time methods to approximate the optimal target sensing order, combined with efficient convex optimization methods that incorporate the constraints posed by the robot sensor footprint and obstacles in the environment. Importantly, we develop bounds that limit the gap between the exact and approximate solutions. Algorithms for all problems are fully implemented and illustrated with examples. Beyond the utility of our algorithms, the bounds derived in the paper contribute to the theory of optimal coverage planning algorithms. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.18972 [pdf, other]

Risk-Aware Robotics: Tail Risk Measures in Planning, Control, and Verification

Authors: Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Lars Lindemann, Margaret P. Chapman, George J. Pappas, Aaron D. Ames, Joel W. Burdick

Abstract: The need for a systematic approach to risk assessment has increased in recent years due to the ubiquity of autonomous systems that alter our day-to-day experiences and their need for safety, e.g., for self-driving vehicles, mobile service robots, and bipedal robots. These systems are expected to function safely in unpredictable environments and interact seamlessly with humans, whose behavior is no… ▽ More The need for a systematic approach to risk assessment has increased in recent years due to the ubiquity of autonomous systems that alter our day-to-day experiences and their need for safety, e.g., for self-driving vehicles, mobile service robots, and bipedal robots. These systems are expected to function safely in unpredictable environments and interact seamlessly with humans, whose behavior is notably challenging to forecast. We present a survey of risk-aware methodologies for autonomous systems. We adopt a contemporary risk-aware approach to mitigate rare and detrimental outcomes by advocating the use of tail risk measures, a concept borrowed from financial literature. This survey will introduce these measures and explain their relevance in the context of robotic systems for planning, control, and verification applications. △ Less

Submitted 9 September, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.08916 [pdf, ps, other]

doi 10.1109/LCSYS.2024.3416239

Rollover Prevention for Mobile Robots with Control Barrier Functions: Differentiator-Based Adaptation and Projection-to-State Safety

Authors: Ersin Das, Aaron D. Ames, Joel W. Burdick

Abstract: This paper develops rollover prevention guarantees for mobile robots using control barrier function (CBF) theory, and demonstrates the method experimentally. We consider a safety measure based on a zero moment point condition through the lens of CBFs. However, these conditions depend on time-varying and noisy parameters. To address this issue, we present a differentiator-based safety-critical cont… ▽ More This paper develops rollover prevention guarantees for mobile robots using control barrier function (CBF) theory, and demonstrates the method experimentally. We consider a safety measure based on a zero moment point condition through the lens of CBFs. However, these conditions depend on time-varying and noisy parameters. To address this issue, we present a differentiator-based safety-critical controller that estimates these parameters and pairs Input-to-State Stable (ISS) differentiator dynamics with CBFs to achieve rigorous safety guarantees. Additionally, to ensure safety in the presence of disturbances, we utilize a time-varying extension of Projection-to-State Safety (PSSf). The effectiveness of the proposed method is demonstrated via experiments on a tracked robot with a rollover potential on steep slopes. △ Less

Submitted 15 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03215 [pdf, other]

A Safety-Critical Framework for UGVs in Complex Environments: A Data-Driven Discrepancy-Aware Approach

Authors: Skylar X. Wei, Lu Gan, Joel W. Burdick

Abstract: This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map… ▽ More This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map, and a paired ancillary controller, augmented to provide robustness against model uncertainties extracted from learning data. To tackle modeling discrepancies, we identify both matched (input discrepancies) and unmatched model residuals between the true and the nominal reduced-order models using closed-loop tracking errors as training data. Utilizing conformal prediction, we extract probabilistic upper bounds for the unknown model residuals, which serve to construct a robustifying ancillary controller. Further, we also determine maximum tracking discrepancies, also known as the robust control invariance tube, under the augmented policy, formulating them as collision buffers. Employing a LiDAR-based occupancy map to characterize the environment, we construct a discrepancy-aware cost map that incorporates these collision buffers. This map is then integrated into a sampling-based model predictive path planner that generates optimal and safe trajectories that can be robustly tracked by the augmented ancillary controller in the presence of model mismatches. The effectiveness of the framework is experimentally validated for autonomous high-speed trajectory tracking in a cluttered environment with four different vehicle-terrain configurations. We also showcase the framework's versatility by reformulating it as a driver-assist program, providing collision avoidance corrections based on user joystick commands. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.01881 [pdf, other]

Robust Control Barrier Functions using Uncertainty Estimation with Application to Mobile Robots

Authors: Ersin Das, Joel W. Burdick

Abstract: This paper proposes a safety-critical control design approach for nonlinear control affine systems in the presence of matched and unmatched uncertainties. Our constructive framework couples control barrier function (CBF) theory with a new uncertainty estimator to ensure robust safety. We use the estimated uncertainty, along with a derived upper bound on the estimation error, for synthesizing CBFs… ▽ More This paper proposes a safety-critical control design approach for nonlinear control affine systems in the presence of matched and unmatched uncertainties. Our constructive framework couples control barrier function (CBF) theory with a new uncertainty estimator to ensure robust safety. We use the estimated uncertainty, along with a derived upper bound on the estimation error, for synthesizing CBFs and safety-critical controllers via a quadratic program-based feedback control law that rigorously ensures robust safety while improving disturbance rejection performance. We extend the method to higher-order CBFs (HOCBFs) to achieve safety under unmatched uncertainty, which may cause relative degree differences with respect to control input and disturbances. We assume the relative degree difference is at most one, resulting in a second-order cone constraint. We demonstrate the proposed robust HOCBF method through a simulation of an uncertain elastic actuator control problem and experimentally validate the efficacy of our robust CBF framework on a tracked robot with slope-induced matched and unmatched perturbations. △ Less

Submitted 30 January, 2025; v1 submitted 3 January, 2024; originally announced January 2024.

arXiv:2310.05865 [pdf, other]

A Learning-Based Framework for Safe Human-Robot Collaboration with Multiple Backup Control Barrier Functions

Authors: Neil C. Janwani, Ersin Daş, Thomas Touma, Skylar X. Wei, Tamas G. Molnar, Joel W. Burdick

Abstract: Ensuring robot safety in complex environments is a difficult task due to actuation limits, such as torque bounds. This paper presents a safety-critical control framework that leverages learning-based switching between multiple backup controllers to formally guarantee safety under bounded control inputs while satisfying driver intention. By leveraging backup controllers designed to uphold safety an… ▽ More Ensuring robot safety in complex environments is a difficult task due to actuation limits, such as torque bounds. This paper presents a safety-critical control framework that leverages learning-based switching between multiple backup controllers to formally guarantee safety under bounded control inputs while satisfying driver intention. By leveraging backup controllers designed to uphold safety and input constraints, backup control barrier functions (BCBFs) construct implicitly defined control invariance sets via a feasible quadratic program (QP). However, BCBF performance largely depends on the design and conservativeness of the chosen backup controller, especially in our setting of human-driven vehicles in complex, e.g, off-road, conditions. While conservativeness can be reduced by using multiple backup controllers, determining when to switch is an open problem. Consequently, we develop a broadcast scheme that estimates driver intention and integrates BCBFs with multiple backup strategies for human-robot interaction. An LSTM classifier uses data inputs from the robot, human, and safety algorithms to continually choose a backup controller in real-time. We demonstrate our method's efficacy on a dual-track robot in obstacle avoidance scenarios. Our framework guarantees robot safety while adhering to driver intention. △ Less

Submitted 7 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted to the International Conference on Robotics and Automation 2024

arXiv:2309.08766 [pdf, other]

The Fractal Hand-II: Reviving a Classic Mechanism for Contemporary Grasping Challenges

Authors: Malcolm G. A. Tisdale, Joel W. Burdick

Abstract: This paper, and its companion, propose a new fractal robotic gripper, drawing inspiration from the century-old Fractal Vise. The unusual synergistic properties allow it to passively conform to diverse objects using only one actuator. Designed to be easily integrated with prevailing parallel jaw grippers, it alleviates the complexities tied to perception and grasp planning, especially when dealing… ▽ More This paper, and its companion, propose a new fractal robotic gripper, drawing inspiration from the century-old Fractal Vise. The unusual synergistic properties allow it to passively conform to diverse objects using only one actuator. Designed to be easily integrated with prevailing parallel jaw grippers, it alleviates the complexities tied to perception and grasp planning, especially when dealing with unpredictable object poses and geometries. We build on the foundational principles of the Fractal Vise to a broader class of gripping mechanisms, and also address the limitations that had led to its obscurity. Two Fractal Fingers, coupled by a closing actuator, can form an adaptive and synergistic Fractal Hand. We articulate a design methodology for low cost, easy to fabricate, large workspace, and compliant Fractal Fingers. The companion paper delves into the kinematics and grasping properties of a specific class of Fractal Fingers and Hands. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: This paper is prepared for ICRA 2024

arXiv:2304.08538 [pdf, other]

Robust Control Barrier Functions with Uncertainty Estimation

Authors: Ersin Daş, Skylar X. Wei, Joel W. Burdick

Abstract: This paper proposes a safety controller for control-affine nonlinear systems with unmodelled dynamics and disturbances to improve closed-loop robustness. Uncertainty estimation-based control barrier functions (CBFs) are utilized to ensure robust safety in the presence of model uncertainties, which may depend on control input and states. We present a new uncertainty/disturbance estimator with theor… ▽ More This paper proposes a safety controller for control-affine nonlinear systems with unmodelled dynamics and disturbances to improve closed-loop robustness. Uncertainty estimation-based control barrier functions (CBFs) are utilized to ensure robust safety in the presence of model uncertainties, which may depend on control input and states. We present a new uncertainty/disturbance estimator with theoretical upper bounds on estimation error and estimated outputs, which are used to ensure robust safety by formulating a convex optimization problem using a high-order CBF. The possibly unsafe nominal feedback controller is augmented with the proposed estimator in two frameworks (1) an uncertainty compensator and (2) a robustifying reformulation of CBF constraint with respect to the estimator outputs. The former scheme ensures safety with performance improvement by adaptively rejecting the matched uncertainty. The second method uses uncertainty estimation to robustify higher-order CBFs for safety-critical control. The proposed methods are demonstrated in simulations of an uncertain adaptive cruise control problem and a multirotor obstacle avoidance situation. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.03658 [pdf, ps, other]

An Active Learning Based Robot Kinematic Calibration Framework Using Gaussian Processes

Authors: Ersin Daş, Joel W. Burdick

Abstract: Future NASA lander missions to icy moons will require completely automated, accurate, and data efficient calibration methods for the robot manipulator arms that sample icy terrains in the lander's vicinity. To support this need, this paper presents a Gaussian Process (GP) approach to the classical manipulator kinematic calibration process. Instead of identifying a corrected set of Denavit-Hartenbe… ▽ More Future NASA lander missions to icy moons will require completely automated, accurate, and data efficient calibration methods for the robot manipulator arms that sample icy terrains in the lander's vicinity. To support this need, this paper presents a Gaussian Process (GP) approach to the classical manipulator kinematic calibration process. Instead of identifying a corrected set of Denavit-Hartenberg kinematic parameters, a set of GPs models the residual kinematic error of the arm over the workspace. More importantly, this modeling framework allows a Gaussian Process Upper Confident Bound (GP-UCB) algorithm to efficiently and adaptively select the calibration's measurement points so as to minimize the number of experiments, and therefore minimize the time needed for recalibration. The method is demonstrated in simulation on a simple 2-DOF arm, a 6 DOF arm whose geometry is a candidate for a future NASA mission, and a 7 DOF Barrett WAM arm. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.01614 [pdf, other]

doi 10.55417/fr.2024006

STEP: Stochastic Traversability Evaluation and Planning for Risk-Aware Off-road Navigation; Results from the DARPA Subterranean Challenge

Authors: Anushri Dixit, David D. Fan, Kyohei Otsu, Sharmita Dey, Ali-Akbar Agha-Mohammadi, Joel W. Burdick

Abstract: Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Sub… ▽ More Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Subterranean Challenge, we propose an approach to improve autonomous traversal of robots in subterranean environments that are perceptually degraded and completely unknown through a traversability and planning framework called STEP (Stochastic Traversability Evaluation and Planning). We present 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC), 4) fast recovery behaviors to account for unexpected scenarios that may cause failure, and 5) risk-based gait adaptation for quadrupedal robots. We illustrate and validate extensive results from our experiments on wheeled and legged robotic platforms in field studies at the Valentine Cave, CA (cave environment), Kentucky Underground, KY (mine environment), and Louisville Mega Cavern, KY (final competition site for the DARPA Subterranean Challenge with tunnel, urban, and cave environments). △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.02828

Journal ref: Field Robotics, 4, 2024, 182-210

arXiv:2302.13687 [pdf, other]

FRoGGeR: Fast Robust Grasp Generation via the Min-Weight Metric

Authors: Albert H. Li, Preston Culbertson, Joel W. Burdick, Aaron D. Ames

Abstract: Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a no… ▽ More Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a novel, almost-everywhere differentiable approximation of the classical epsilon grasp metric. The min-weight metric is simple and interpretable, provides a reasonable measure of grasp robustness, and admits numerically efficient gradients for smooth optimization. We leverage these properties to rapidly synthesize collision-free robust grasps - typically in less than a second. FRoGGeR can refine the candidate grasps generated by other methods (heuristic, data-driven, etc.) and is compatible with many object representations (SDFs, meshes, etc.). We study FRoGGeR's performance on over 40 objects drawn from the YCB dataset, outperforming a competitive baseline in computation time, feasibility rate of grasp synthesis, and picking success in simulation. We conclude that FRoGGeR is fast: it has a median synthesis time of 0.834s over hundreds of experiments. △ Less

Submitted 24 July, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at IROS 2023. The arXiv version contains the appendix, which does not appear in the conference version

arXiv:2212.06253 [pdf, other]

Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data

Authors: Prithvi Akella, Skylar X. Wei, Joel W. Burdick, Aaron D. Ames

Abstract: Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the ris… ▽ More Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2212.00278 [pdf, other]

Adaptive Conformal Prediction for Motion Planning among Dynamic Agents

Authors: Anushri Dixit, Lars Lindemann, Skylar Wei, Matthew Cleaveland, George J. Pappas, Joel W. Burdick

Abstract: This paper proposes an algorithm for motion planning among dynamic agents using adaptive conformal prediction. We consider a deterministic control system and use trajectory predictors to predict the dynamic agents' future motion, which is assumed to follow an unknown distribution. We then leverage ideas from adaptive conformal prediction to dynamically quantify prediction uncertainty from an onlin… ▽ More This paper proposes an algorithm for motion planning among dynamic agents using adaptive conformal prediction. We consider a deterministic control system and use trajectory predictors to predict the dynamic agents' future motion, which is assumed to follow an unknown distribution. We then leverage ideas from adaptive conformal prediction to dynamically quantify prediction uncertainty from an online data stream. Particularly, we provide an online algorithm uses delayed agent observations to obtain uncertainty sets for multistep-ahead predictions with probabilistic coverage. These uncertainty sets are used within a model predictive controller to safely navigate among dynamic agents. While most existing data-driven prediction approached quantify prediction uncertainty heuristically, we quantify the true prediction uncertainty in a distribution-free, adaptive manner that even allows to capture changes in prediction quality and the agents' motion. We empirically evaluate of our algorithm on a simulation case studies where a drone avoids a flying frisbee. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2204.09833 [pdf, other]

Sample-Based Bounds for Coherent Risk Measures: Applications to Policy Synthesis and Verification

Authors: Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick, Aaron D. Ames

Abstract: The dramatic increase of autonomous systems subject to variable environments has given rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper aims to address a few problems regarding risk-aware verification and policy synthesis, by first developing a sample-based method to bound the risk measure evaluation of a random variable wh… ▽ More The dramatic increase of autonomous systems subject to variable environments has given rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper aims to address a few problems regarding risk-aware verification and policy synthesis, by first developing a sample-based method to bound the risk measure evaluation of a random variable whose distribution is unknown. These bounds permit us to generate high-confidence verification statements for a large class of robotic systems. Second, we develop a sample-based method to determine solutions to non-convex optimization problems that outperform a large fraction of the decision space of possible solutions. Both sample-based approaches then permit us to rapidly synthesize risk-aware policies that are guaranteed to achieve a minimum level of system performance. To showcase our approach in simulation, we verify a cooperative multi-agent system and develop a risk-aware controller that outperforms the system's baseline controller. We also mention how our approach can be extended to account for any $g$-entropic risk measure - the subset of coherent risk measures on which we focus. △ Less

Submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.09596 [pdf, other]

doi 10.1016/j.artint.2023.104018

Risk-Averse Receding Horizon Motion Planning for Obstacle Avoidance using Coherent Risk Measures

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: This paper studies the problem of risk-averse receding horizon motion planning for agents with uncertain dynamics, in the presence of stochastic, dynamic obstacles. We propose a model predictive control (MPC) scheme that formulates the obstacle avoidance constraint using coherent risk measures. To handle disturbances, or process noise, in the state dynamics, the state constraints are tightened in… ▽ More This paper studies the problem of risk-averse receding horizon motion planning for agents with uncertain dynamics, in the presence of stochastic, dynamic obstacles. We propose a model predictive control (MPC) scheme that formulates the obstacle avoidance constraint using coherent risk measures. To handle disturbances, or process noise, in the state dynamics, the state constraints are tightened in a risk-aware manner to provide a disturbance feedback policy. We also propose a waypoint following algorithm that uses the proposed MPC scheme for discrete distributions and prove its risk-sensitive recursive feasibility while guaranteeing finite-time task completion. We further investigate some commonly used coherent risk metrics, namely, conditional value-at-risk (CVaR), entropic value-at-risk (EVaR), and g-entropic risk measures, and propose a tractable incorporation within MPC. We illustrate our framework via simulation studies. △ Less

Submitted 28 September, 2023; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Accepted to Artificial Intelligence Journal, Special Issue on Risk-aware Autonomous Systems: Theory and Practice. arXiv admin note: text overlap with arXiv:2011.11211

Journal ref: Artificial Intelligence, 325, 2023, 104018

arXiv:2203.14913 [pdf, other]

Moving Obstacle Avoidance: a Data-Driven Risk-Aware Approach

Authors: Skylar X. Wei, Anushri Dixit, Shashank Tomar, Joel W. Burdick

Abstract: This paper proposes a new structured method for a moving agent to predict the paths of dynamically moving obstacles and avoid them using a risk-aware model predictive control (MPC) scheme. Given noisy measurements of the a priori unknown obstacle trajectory, a bootstrapping technique predicts a set of obstacle trajectories. The bootstrapped predictions are incorporated in the MPC optimization usin… ▽ More This paper proposes a new structured method for a moving agent to predict the paths of dynamically moving obstacles and avoid them using a risk-aware model predictive control (MPC) scheme. Given noisy measurements of the a priori unknown obstacle trajectory, a bootstrapping technique predicts a set of obstacle trajectories. The bootstrapped predictions are incorporated in the MPC optimization using a risk-aware methodology so as to provide probabilistic guarantees on obstacle avoidance. We validate our methods using simulations of a 3-dimensional multi-rotor drone that avoids various moving obstacles, such as a thrown ball and a frisbee with air drag. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: This is prepared for IEEE Control Systems Letters (L-CSS) 2022

arXiv:2203.12062 [pdf, other]

Distributionally Robust Model Predictive Control with Total Variation Distance

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: This paper studies the problem of distributionally robust model predictive control (MPC) using total variation distance ambiguity sets. For a discrete-time linear system with additive disturbances, we provide a conditional value-at-risk reformulation of the MPC optimization problem that is distributionally robust in the expected cost and chance constraints. The distributionally robust chance const… ▽ More This paper studies the problem of distributionally robust model predictive control (MPC) using total variation distance ambiguity sets. For a discrete-time linear system with additive disturbances, we provide a conditional value-at-risk reformulation of the MPC optimization problem that is distributionally robust in the expected cost and chance constraints. The distributionally robust chance constraint is over-approximated as a simpler, tightened chance constraint that reduces the computational burden. Numerical experiments support our results on probabilistic guarantees and computational efficiency. △ Less

Submitted 24 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted to LCSS

arXiv:2110.10341 [pdf, other]

Quadrotor Trajectory Tracking with Learned Dynamics: Joint Koopman-based Learning of System Models and Function Dictionaries

Authors: Carl Folkestad, Skylar X. Wei, Joel W. Burdick

Abstract: Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional mod… ▽ More Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional models that are intractable for real time control. This paper presents a novel method that jointly learns a function dictionary and lifted bilinear model purely from data by incorporating the Koopman model in a neural network architecture. Nonlinear MPC design utilizing the learned model can be performed readily. We experimentally realized this method on a multirotor drone for agile trajectory tracking at low altitudes where the aerodynamic ground effect influences the system's behavior. Experimental results demonstrate that the learning-based controller achieves similar performance as a nonlinear MPC based on a nominal dynamics model in medium altitude. However, our learning-based system can reliably track trajectories in near-ground flight regimes while the nominal controller crashes due to unmodeled dynamical effects that are captured by our method. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:2105.08036

arXiv:2105.08036 [pdf, other]

Koopman NMPC: Koopman-based Learning and Nonlinear Model Predictive Control of Control-affine Systems

Authors: Carl Folkestad, Joel W. Burdick

Abstract: Koopman-based learning methods can potentially be practical and powerful tools for dynamical robotic systems. However, common methods to construct Koopman representations seek to learn lifted linear models that cannot capture nonlinear actuation effects inherent in many robotic systems. This paper presents a learning and control methodology that is a first step towards overcoming this limitation.… ▽ More Koopman-based learning methods can potentially be practical and powerful tools for dynamical robotic systems. However, common methods to construct Koopman representations seek to learn lifted linear models that cannot capture nonlinear actuation effects inherent in many robotic systems. This paper presents a learning and control methodology that is a first step towards overcoming this limitation. Using the Koopman canonical transform, control-affine dynamics can be expressed by a lifted bilinear model. The learned model is used for nonlinear model predictive control (NMPC) design where the bilinear structure can be exploited to improve computational efficiency. The benefits for control-affine dynamics compared to existing Koopman-based methods are highlighted through an example of a simulated planar quadrotor. Prediction error is greatly reduced and closed loop performance similar to NMPC with full model knowledge is achieved. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2103.14727 [pdf, other]

Risk-Averse Stochastic Shortest Path Planning

Authors: Mohamadreza Ahmadi, Anushri Dixit, Joel W. Burdick, Aaron D. Ames

Abstract: We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations of the system, we consider a nested dynamic coherent risk total cost functional rather than the conventional risk-neutral total expected cost. Under… ▽ More We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations of the system, we consider a nested dynamic coherent risk total cost functional rather than the conventional risk-neutral total expected cost. Under some assumptions, we show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation. We propose a computational technique based on difference convex programs (DCPs) to find the associated value functions and therefore the risk-averse policies. A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2103.03388 [pdf, other]

Limits of Probabilistic Safety Guarantees when Considering Human Uncertainty

Authors: Richard Cheng, Richard M. Murray, Joel W. Burdick

Abstract: When autonomous robots interact with humans, such as during autonomous driving, explicit safety guarantees are crucial in order to avoid potentially life-threatening accidents. Many data-driven methods have explored learning probabilistic bounds over human agents' trajectories (i.e. confidence tubes that contain trajectories with probability $δ$), which can then be used to guarantee safety with pr… ▽ More When autonomous robots interact with humans, such as during autonomous driving, explicit safety guarantees are crucial in order to avoid potentially life-threatening accidents. Many data-driven methods have explored learning probabilistic bounds over human agents' trajectories (i.e. confidence tubes that contain trajectories with probability $δ$), which can then be used to guarantee safety with probability $1-δ$. However, almost all existing works consider $δ\geq 0.001$. The purpose of this paper is to argue that (1) in safety-critical applications, it is necessary to provide safety guarantees with $δ< 10^{-8}$, and (2) current learning-based methods are ill-equipped to compute accurate confidence bounds at such low $δ$. Using human driving data (from the highD dataset), as well as synthetically generated data, we show that current uncertainty models use inaccurate distributional assumptions to describe human behavior and/or require infeasible amounts of data to accurately learn confidence bounds for $δ\leq 10^{-8}$. These two issues result in unreliable confidence bounds, which can have dangerous implications if deployed on safety-critical systems. △ Less

Submitted 24 March, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: ICRA 2021

arXiv:2102.09119 [pdf, ps, other]

Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Authors: Yidan Qin, Max Allan, Yisong Yue, Joel W. Burdick, Mahdi Azizian

Abstract: Surgical state estimators in robot-assisted surgery (RAS) - especially those trained via learning techniques - rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex… ▽ More Surgical state estimators in robot-assisted surgery (RAS) - especially those trained via learning techniques - rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex environments. The combination of high diversity and limited data calls for new learning methods that are robust and invariant to operating conditions and surgical techniques. We propose StiseNet, a Surgical Task Invariance State Estimation Network with an invariance induction framework that minimizes the effects of variations in surgical technique and operating environments inherent to RAS datasets. StiseNet's adversarial architecture learns to separate nuisance factors from information needed for surgical state estimation. StiseNet is shown to outperform state-of-the-art state estimation methods on three datasets (including a new real-world RAS dataset: HERNIA-20). △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: Accepted to IEEE Robotics & Automation Letters

arXiv:2011.11211 [pdf, other]

Risk-Sensitive Motion Planning using Entropic Value-at-Risk

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: We consider the problem of risk-sensitive motion planning in the presence of randomly moving obstacles. To this end, we adopt a model predictive control (MPC) scheme and pose the obstacle avoidance constraint in the MPC problem as a distributionally robust constraint with a KL divergence ambiguity set. This constraint is the dual representation of the Entropic Value-at-Risk (EVaR). Building upon t… ▽ More We consider the problem of risk-sensitive motion planning in the presence of randomly moving obstacles. To this end, we adopt a model predictive control (MPC) scheme and pose the obstacle avoidance constraint in the MPC problem as a distributionally robust constraint with a KL divergence ambiguity set. This constraint is the dual representation of the Entropic Value-at-Risk (EVaR). Building upon this viewpoint, we propose an algorithm to follow waypoints and discuss its feasibility and completion in finite time. We compare the policies obtained using EVaR with those obtained using another common coherent risk measure, Conditional Value-at-Risk (CVaR), via numerical experiments for a 2D system. We also implement the waypoint following algorithm on a 3D quadcopter simulation. △ Less

Submitted 10 April, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to 2021 European Control Conference (ECC)

Journal ref: European Control Conference (ECC) 2021

arXiv:2011.04812 [pdf, other]

doi 10.1109/ICRA48506.2021.9560840

ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

Authors: Kejun Li, Maegan Tucker, Erdem Bıyık, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames

Abstract: Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the… ▽ More Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the Region of Interest Active Learning (ROIAL) framework, which actively learns each user's underlying utility function over a region of interest that ensures safety and comfort. ROIAL learns from ordinal and preference feedback, which are more reliable feedback mechanisms than absolute numerical scores. The algorithm's performance is evaluated both in simulation and experimentally for three non-disabled subjects walking inside of a lower-body exoskeleton. ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters. The algorithm discovers both commonalities and discrepancies across users' gait preferences and identifies the gait parameters that most influenced user feedback. These results demonstrate the feasibility of recovering gait utility landscapes from limited human trials. △ Less

Submitted 30 March, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 6 pages + 1 page of references; 7 figures; To Appear at ICRA 2021

arXiv:2009.11937 [pdf, ps, other]

daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Authors: Yidan Qin, Seyedshams Feyzabadi, Max Allan, Joel W. Burdick, Mahdi Azizian

Abstract: This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound… ▽ More This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound scanning, often have distinguishable tool kinematics and visual features, and can be described as a series of fine-grained states with transition schematics. We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions. daVinciNet performs concurrent end-effector trajectory and surgical state predictions using features extracted from multiple data streams, including robot kinematics, endoscopic vision, and system events. We evaluate our proposed model on an extended Robotic Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi surgical system and the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: Accepted to IROS 2020

arXiv:2004.05273 [pdf, other]

Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties

Authors: Richard Cheng, Mohammad Javad Khojasteh, Aaron D. Ames, Joel W. Burdick

Abstract: Robots operating in real world settings must navigate and maintain safety while interacting with many heterogeneous agents and obstacles. Multi-Agent Control Barrier Functions (CBF) have emerged as a computationally efficient tool to guarantee safety in multi-agent environments, but they assume perfect knowledge of both the robot dynamics and other agents' dynamics. While knowledge of the robot's… ▽ More Robots operating in real world settings must navigate and maintain safety while interacting with many heterogeneous agents and obstacles. Multi-Agent Control Barrier Functions (CBF) have emerged as a computationally efficient tool to guarantee safety in multi-agent environments, but they assume perfect knowledge of both the robot dynamics and other agents' dynamics. While knowledge of the robot's dynamics might be reasonably well known, the heterogeneity of agents in real-world environments means there will always be considerable uncertainty in our prediction of other agents' dynamics. This work aims to learn high-confidence bounds for these dynamic uncertainties using Matrix-Variate Gaussian Process models, and incorporates them into a robust multi-agent CBF framework. We transform the resulting min-max robust CBF into a quadratic program, which can be efficiently solved in real time. We verify via simulation results that the nominal multi-agent CBF is often violated during agent interactions, whereas our robust formulation maintains safety with a much higher probability and adapts to learned uncertainties △ Less

Submitted 22 September, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Journal ref: 59th IEEE Conference on Decision and Control (CDC 2020)

arXiv:2004.01708 [pdf, other]

Episodic Koopman Learning of Nonlinear Robot Dynamics with Application to Fast Multirotor Landing

Authors: Carl Folkestad, Daniel Pastor, Joel W. Burdick

Abstract: This paper presents a novel episodic method to learn a robot's nonlinear dynamics model and an increasingly optimal control sequence for a set of tasks. The method is based on the {\em Koopman operator} approach to nonlinear dynamical systems analysis, which models the flow of {\em observables} in a function space, rather than a flow in a state space. Practically, this method estimates a nonlinear… ▽ More This paper presents a novel episodic method to learn a robot's nonlinear dynamics model and an increasingly optimal control sequence for a set of tasks. The method is based on the {\em Koopman operator} approach to nonlinear dynamical systems analysis, which models the flow of {\em observables} in a function space, rather than a flow in a state space. Practically, this method estimates a nonlinear diffeomorphism that lifts the dynamics to a higher dimensional space where they are linear. Efficient Model Predictive Control methods can then be applied to the lifted model. This approach allows for real time implementation in on-board hardware, with rigorous incorporation of both input and state constraints during learning. We demonstrate the method in a real-time implementation of fast multirotor landing, where the nonlinear ground effect is learned and used to improve landing speed and quality. △ Less

Submitted 3 April, 2020; originally announced April 2020.

Comments: Accepted to the International Conference on Robotics and Automation 2020 (ICRA). arXiv admin note: text overlap with arXiv:1911.08751

arXiv:2003.09267 [pdf, other]

Barrier Functions for Multiagent-POMDPs with DTL Specifications

Authors: Mohamadreza Ahmadi, Andrew Singletary, Joel W. Burdick, Aaron D. Ames

Abstract: Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfact… ▽ More Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: arXiv admin note: text overlap with arXiv:1903.07823

arXiv:2003.06495 [pdf, other]

Human Preference-Based Learning for High-dimensional Optimization of Exoskeleton Walking Gaits

Authors: Maegan Tucker, Myra Cheng, Ellen Novoseller, Richard Cheng, Yisong Yue, Joel W. Burdick, Aaron D. Ames

Abstract: Optimizing lower-body exoskeleton walking gaits for user comfort requires understanding users' preferences over a high-dimensional gait parameter space. However, existing preference-based learning methods have only explored low-dimensional domains due to computational limitations. To learn user preferences in high dimensions, this work presents LineCoSpar, a human-in-the-loop preference-based fram… ▽ More Optimizing lower-body exoskeleton walking gaits for user comfort requires understanding users' preferences over a high-dimensional gait parameter space. However, existing preference-based learning methods have only explored low-dimensional domains due to computational limitations. To learn user preferences in high dimensions, this work presents LineCoSpar, a human-in-the-loop preference-based framework that enables optimization over many parameters by iteratively exploring one-dimensional subspaces. Additionally, this work identifies gait attributes that characterize broader preferences across users. In simulations and human trials, we empirically verify that LineCoSpar is a sample-efficient approach for high-dimensional preference optimization. Our analysis of the experimental data reveals a correspondence between human preferences and objective measures of dynamicity, while also highlighting differences in the utility functions underlying individual users' gait preferences. This result has implications for exoskeleton gait synthesis, an active field with applications to clinical use and patient rehabilitation. △ Less

Submitted 8 August, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

Comments: 8 pages, 9 figures, 2 tables, to appear at IROS 2020

arXiv:2002.02921 [pdf, other]

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Authors: Yidan Qin, Sahba Aghajani Pedram, Seyedshams Feyzabadi, Max Allan, A. Jonathan McLeod, Joel W. Burdick, Mahdi Azizian

Abstract: Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The ob… ▽ More Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: Accepted to ICRA 2020

arXiv:2001.07679 [pdf, other]

Stochastic Finite State Control of POMDPs with LTL Specifications

Authors: Mohamadreza Ahmadi, Rangoli Sharan, Joel W. Burdick

Abstract: Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty and imperfect sensing, e.g. robot manipulation and self-driving cars. However, optimal control of POMDPs is notoriously intractable. This paper considers the quantitative problem of synthesizing sub-optimal stochastic finite state controllers (sFSCs) for POMDPs such… ▽ More Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty and imperfect sensing, e.g. robot manipulation and self-driving cars. However, optimal control of POMDPs is notoriously intractable. This paper considers the quantitative problem of synthesizing sub-optimal stochastic finite state controllers (sFSCs) for POMDPs such that the probability of satisfying a set of high-level specifications in terms of linear temporal logic (LTL) formulae is maximized. We begin by casting the latter problem into an optimization and use relaxations based on the Poisson equation and McCormick envelopes. Then, we propose an stochastic bounded policy iteration algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations, but can be stopped by the user based on time or memory considerations. We illustrate the proposed method by a robot navigation case study. △ Less

Submitted 21 January, 2020; originally announced January 2020.

arXiv:1909.10209 [pdf, other]

Energy-Efficient Motion Planning for Multi-Modal Hybrid Locomotion

Authors: H. J. Terry Suh, Xiaobin Xiong, Andrew Singletary, Aaron D. Ames, Joel W. Burdick

Abstract: Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as opt… ▽ More Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as optimal transport cost along each segment. This cost is approximated from batches of offline trajectory optimizations, which allows the complex effects of vehicle under-actuation and dynamic constraints to be approximately captured in a tractable way. Our method is illustrated on a hybrid double-integrator, an amphibious robot, and a flying-driving drone, showing the practicality of the approach. △ Less

Submitted 4 August, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: Accepted to International Conference on Intelligent Robots and Systems (IROS) 2020

arXiv:1908.01289 [pdf, other]

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Authors: Ellen R. Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, Joel W. Burdick

Abstract: In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in… ▽ More In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present DUELING POSTERIOR SAMPLING (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the preference feedback. As preference feedback is provided on trajectories rather than individual state-action pairs, we develop a Bayesian approach for the credit assignment problem, translating preferences to a posterior distribution over state-action reward models. We prove an asymptotic Bayesian no-regret rate for DPS with a Bayesian linear regression credit assignment model. This is the first regret guarantee for preference-based RL to our knowledge. We also discuss possible avenues for extending the proof methodology to other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines. △ Less

Submitted 29 June, 2020; v1 submitted 4 August, 2019; originally announced August 2019.

Comments: To appear in Conference on Uncertainty in Artificial Intelligence (UAI), 2020. 9 pages before references and appendix; 51 pages total; 7 figures; 4 tables. This replacement incorporates reviewer comments, and in comparison to version 1, extends the theoretical and empirical analyses and adds mathematical detail. Code: https://github.com/ernovoseller/DuelingPosteriorSampling

arXiv:1905.05380 [pdf, other]

Control Regularization for Reduced Variance Reinforcement Learning

Authors: Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick

Abstract: Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of t… ▽ More Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: Appearing in ICML 2019

arXiv:1903.08792 [pdf, other]

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Authors: Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick

Abstract: Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2)… ▽ More Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: Published in AAAI 2019

arXiv:1903.07823 [pdf, other]

Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions

Authors: Mohamadreza Ahmadi, Andrew Singletary, Joel W. Burdick, Aaron D. Ames

Abstract: A multi-agent partially observable Markov decision process (MPOMDP) is a modeling paradigm used for high-level planning of heterogeneous autonomous agents subject to uncertainty and partial observation. Despite their modeling efficiency, MPOMDPs have not received significant attention in safety-critical settings. In this paper, we use barrier functions to design policies for MPOMDPs that ensure sa… ▽ More A multi-agent partially observable Markov decision process (MPOMDP) is a modeling paradigm used for high-level planning of heterogeneous autonomous agents subject to uncertainty and partial observation. Despite their modeling efficiency, MPOMDPs have not received significant attention in safety-critical settings. In this paper, we use barrier functions to design policies for MPOMDPs that ensure safety. Notably, our method does not rely on discretization of the belief space, or finite memory. To this end, we formulate sufficient and necessary conditions for the safety of a given set based on discrete-time barrier functions (DTBFs) and we demonstrate that our formulation also allows for Boolean compositions of DTBFs for representing more complicated safe sets. We show that the proposed method can be implemented online by a sequence of one-step greedy algorithms as a standalone safe controller or as a safety-filter given a nominal planning policy. We illustrate the efficiency of the proposed methodology based on DTBFs using a high-fidelity simulation of heterogeneous robots. △ Less

Submitted 12 September, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: 8 pages and 4 figures

arXiv:1806.07555 [pdf, other]

Stagewise Safe Bayesian Optimization with Gaussian Processes

Authors: Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue

Abstract: Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function… ▽ More Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function with absolute feedback or preference feedback subject to unknown safety constraints. We develop an efficient safe Bayesian optimization algorithm, StageOpt, that separates safe region expansion and utility function maximization into two distinct stages. Compared to existing approaches which interleave between expansion and optimization, we show that StageOpt is more efficient and naturally applicable to a broader class of problems. We provide theoretical guarantees for both the satisfaction of safety constraints as well as convergence to the optimal utility value. We evaluate StageOpt on both a variety of synthetic experiments, as well as in clinical practice. We demonstrate that StageOpt is more effective than existing safe optimization approaches, and is able to safely and effectively optimize spinal cord stimulation therapy in our clinical experiments. △ Less

Submitted 26 January, 2020; v1 submitted 20 June, 2018; originally announced June 2018.

Comments: International Conference on Machine Learning (ICML) 2018

arXiv:1711.07894 [pdf, other]

Quantifying Performance of Bipedal Standing with Multi-channel EMG

Authors: Yanan Sui, Kun ho Kim, Joel W. Burdick

Abstract: Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feed… ▽ More Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feedback. To study the potential for automated electromyography (EMG) analysis in SCI, we evaluated the standing quality of paralyzed patients undergoing electrical spinal cord stimulation using both video and multi-channel surface EMG recordings during spinal stimulation therapy sessions. The quality of standing under different stimulation settings was quantified manually by experienced clinicians. By correlating features of the recorded EMG activity with the expert evaluations, we show that multi-channel EMG recording can provide accurate, fast, and robust estimation for the quality of bipedal standing in spinally stimulated SCI patients. Moreover, our analysis shows that the total number of EMG channels needed to effectively predict standing quality can be reduced while maintaining high estimation accuracy, which provides more flexibility for rehabilitation robotic systems to incorporate EMG recordings. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Journal ref: IROS 2017

arXiv:1710.03592 [pdf, ps, other]

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

Authors: Kun Li, Joel W. Burdick

Abstract: This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta I… ▽ More This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation. △ Less

Submitted 12 October, 2017; v1 submitted 7 October, 2017; originally announced October 2017.

Comments: arXiv admin note: text overlap with arXiv:1707.09394

arXiv:1708.07738 [pdf, ps, other]

A Function Approximation Method for Model-based High-Dimensional Inverse Reinforcement Learning

Authors: Kun Li, Joel W. Burdick

Abstract: This works handles the inverse reinforcement learning problem in high-dimensional state spaces, which relies on an efficient solution of model-based high-dimensional reinforcement learning problems. To solve the computationally expensive reinforcement learning problems, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a funct… ▽ More This works handles the inverse reinforcement learning problem in high-dimensional state spaces, which relies on an efficient solution of model-based high-dimensional reinforcement learning problems. To solve the computationally expensive reinforcement learning problems, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function based on the observed human actions for inverse reinforcement learning problems. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle high-dimensional even continuous state spaces efficiently. We test the proposed method in a simulated environment to show its accuracy, and three clinical tasks to show how it can be used to evaluate a doctor's proficiency. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1707.09394

arXiv:1707.09394 [pdf, ps, other]

Inverse Reinforcement Learning in Large State Spaces via Function Approximation

Authors: Kun Li, Joel W. Burdick

Abstract: This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed motio… ▽ More This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed motion. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle large state spaces efficiently. We test the proposed method in a simulated environment, and show that it is more accurate than existing methods and significantly better in scalability. We also show that the proposed method can extend many existing methods to high-dimensional state spaces. We then apply the method to evaluating the effect of rehabilitative stimulations on patients with spinal cord injuries based on the observed patient motions. △ Less

Submitted 13 August, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

Comments: Experiment updated

arXiv:1707.09393 [pdf, ps, other]

Online Inverse Reinforcement Learning via Bellman Gradient Iteration

Authors: Kun Li, Joel W. Burdick

Abstract: This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-v… ▽ More This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-value with respect to reward function parameter. The gradients are computed with a novel Bellman Gradient Iteration method that allows the reward function to be updated whenever a new observation is available. The method's convergence to a local optimum is proved. This work tests the proposed method in two simulated environments, and evaluates the algorithm's performance under a linear reward function and a non-linear reward function. The results show that the proposed algorithm only requires a limited computation time and storage space, but achieves an increasing accuracy as the number of observations grows. We also present a potential application to robot cleaners at home. △ Less

Submitted 28 July, 2017; originally announced July 2017.

Comments: The code and video are available at https://github.com/mestoking/BellmanGradientIteration/ . arXiv admin note: substantial text overlap with arXiv:1707.07767

arXiv:1707.07767 [pdf, ps, other]

Bellman Gradient Iteration for Inverse Reinforcement Learning

Authors: Kun Li, Yanan Sui, Joel W. Burdick

Abstract: This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods… ▽ More This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods allow us to build a differentiable relation between the Q-value and the reward function and learn an approximately optimal reward function with gradient methods. We test the proposed method in two simulated environments by evaluating the accuracy of different approximations and comparing the proposed method with existing solutions. The results show that even with a linear reward function, the proposed method has a comparable accuracy with the state-of-the-art method adopting a non-linear reward function, and the proposed method is more flexible because it is defined on observed actions instead of trajectories. △ Less

Submitted 24 July, 2017; originally announced July 2017.

Showing 1–50 of 61 results for author: Burdick, J W