Search | arXiv e-print repository

Do No Harm: A Counterfactual Approach to Safe Reinforcement Learning

Authors: Sean Vaskov, Wilko Schwarting, Chris L. Baker

Abstract: Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in,… ▽ More Reinforcement Learning (RL) for control has become increasingly popular due to its ability to learn rich feedback policies that take into account uncertainty and complex representations of the environment. When considering safety constraints, constrained optimization approaches, where agents are penalized for constraint violations, are commonly used. In such methods, if agents are initialized in, or must visit, states where constraint violation might be inevitable, it is unclear how much they should be penalized. We address this challenge by formulating a constraint on the counterfactual harm of the learned policy compared to a default, safe policy. In a philosophical sense this formulation only penalizes the learner for constraint violations that it caused; in a practical sense it maintains feasibility of the optimal control problem. We present simulation studies on a rover with uncertain road friction and a tractor-trailer parking environment that demonstrate our constraint formulation enables agents to learn safer policies than contemporary constrained RL methods. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2106.09127 [pdf, other]

Planning on a (Risk) Budget: Safe Non-Conservative Planning in Probabilistic Dynamic Environments

Authors: Hung-Jui Huang, Kai-Chi Huang, Michal Čáp, Yibiao Zhao, Ying Nian Wu, Chris L. Baker

Abstract: Planning in environments with other agents whose future actions are uncertain often requires compromise between safety and performance. Here our goal is to design efficient planning algorithms with guaranteed bounds on the probability of safety violation, which nonetheless achieve non-conservative performance. To quantify a system's risk, we define a natural criterion called interval risk bounds (… ▽ More Planning in environments with other agents whose future actions are uncertain often requires compromise between safety and performance. Here our goal is to design efficient planning algorithms with guaranteed bounds on the probability of safety violation, which nonetheless achieve non-conservative performance. To quantify a system's risk, we define a natural criterion called interval risk bounds (IRBs), which provide a parametric upper bound on the probability of safety violation over a given time interval or task. We present a novel receding horizon algorithm, and prove that it can satisfy a desired IRB. Our algorithm maintains a dynamic risk budget which constrains the allowable risk at each iteration, and guarantees recursive feasibility by requiring a safe set to be reachable by a contingency plan within the budget. We empirically demonstrate that our algorithm is both safer and less conservative than strong baselines in two simulated autonomous driving experiments in scenarios involving collision avoidance with other vehicles, and additionally demonstrate our algorithm running on an autonomous class 8 truck. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 9 pages, 5 figures, International Conference on Robotics and Automation 2021

arXiv:2105.06979 [pdf, other]

doi 10.3847/2041-8213/ac089b

The Radius of PSR J0740+6620 from NICER and XMM-Newton Data

Authors: M. C. Miller, F. K. Lamb, A. J. Dittmann, S. Bogdanov, Z. Arzoumanian, K. C. Gendreau, S. Guillot, W. C. G. Ho, J. M. Lattimer, M. Loewenstein, S. M. Morsink, P. S. Ray, M. T. Wolff, C. L. Baker, T. Cazeau, S. Manthripragada, C. B. Markwardt, T. Okajima, S. Pollard, I. Cognard, H. T. Cromartie, E. Fonseca, L. Guillemot, M. Kerr, A. Parthasarathy , et al. (3 additional authors not shown)

Abstract: PSR J0740$+$6620 has a gravitational mass of $2.08\pm 0.07~M_\odot$, which is the highest reliably determined mass of any neutron star. As a result, a measurement of its radius will provide unique insight into the properties of neutron star core matter at high densities. Here we report a radius measurement based on fits of rotating hot spot patterns to Neutron Star Interior Composition Explorer (N… ▽ More PSR J0740$+$6620 has a gravitational mass of $2.08\pm 0.07~M_\odot$, which is the highest reliably determined mass of any neutron star. As a result, a measurement of its radius will provide unique insight into the properties of neutron star core matter at high densities. Here we report a radius measurement based on fits of rotating hot spot patterns to Neutron Star Interior Composition Explorer (NICER) and X-ray Multi-Mirror (XMM-Newton) X-ray observations. We find that the equatorial circumferential radius of PSR J0740$+$6620 is $13.7^{+2.6}_{-1.5}$ km (68%). We apply our measurement, combined with the previous NICER mass and radius measurement of PSR J0030$+$0451, the masses of two other $\sim 2~M_\odot$ pulsars, and the tidal deformability constraints from two gravitational wave events, to three different frameworks for equation of state modeling, and find consistent results at $\sim 1.5-3$ times nuclear saturation density. For a given framework, when all measurements are included the radius of a $1.4~M_\odot$ neutron star is known to $\pm 4$% (68% credibility) and the radius of a $2.08~M_\odot$ neutron star is known to $\pm 5$%. The full radius range that spans the $\pm 1σ$ credible intervals of all the radius estimates in the three frameworks is $12.45\pm 0.65$ km for a $1.4~M_\odot$ neutron star and $12.35\pm 0.75$ km for a $2.08~M_\odot$ neutron star. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: 49 pages, 16 figures, submitted to The Astrophysical Journal Letters

arXiv:1912.06787 [pdf, other]

doi 10.15607/RSS.2020.XVI.069

PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

Authors: Dicong Qiu, Yibiao Zhao, Chris L. Baker

Abstract: Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and nonlinear dynamics suitable for robotics applications are challenging to solve. In this paper, we present an efficient differential dynamic programming (DDP) al… ▽ More Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and nonlinear dynamics suitable for robotics applications are challenging to solve. In this paper, we present an efficient differential dynamic programming (DDP) algorithm for belief space planning in POMDPs with uncertainty over a discrete latent state, and continuous states, actions, observations, and nonlinear dynamics. This representation allows planning of dynamic trajectories which are sensitive to structured uncertainty over discrete latent world states. We develop dynamic programming techniques to optimize a contingency plan over a tree of possible observations and belief space trajectories, and also derive a hierarchical version of the algorithm. Our method is applicable to problems with uncertainty over the cost or reward function (e.g., the configuration of goals or obstacles), uncertainty over the dynamics (e.g., the dynamical mode of a hybrid system), and uncertainty about interactions, where other agents' behavior is conditioned on latent intentions. Benchmarks show that our algorithm outperforms popular heuristic approaches to planning under uncertainty, and results from an autonomous lane changing task demonstrate that our algorithm can synthesize robust interactive trajectories. △ Less

Submitted 14 December, 2019; originally announced December 2019.

Comments: 16 pages, 6 figures, preprint

Journal ref: Robotics: Science and Systems, 2020. 69.1-69.10

arXiv:1912.05702 [pdf, other]

doi 10.3847/2041-8213/ab481c

A NICER View of PSR J0030+0451: Millisecond Pulsar Parameter Estimation

Authors: Thomas E. Riley, Anna L. Watts, Slavko Bogdanov, Paul S. Ray, Renee M. Ludlam, Sebastien Guillot, Zaven Arzoumanian, Charles L. Baker, Anna V. Bilous, Deepto Chakrabarty, Keith C. Gendreau, Alice K. Harding, Wynn C. G. Ho, James M. Lattimer, Sharon M. Morsink, Tod E. Strohmayer

Abstract: We report on Bayesian parameter estimation of the mass and equatorial radius of the millisecond pulsar PSR J0030$+$0451, conditional on pulse-profile modeling of Neutron Star Interior Composition Explorer (NICER) X-ray spectral-timing event data. We perform relativistic ray-tracing of thermal emission from hot regions of the pulsar's surface. We assume two distinct hot regions based on two clear p… ▽ More We report on Bayesian parameter estimation of the mass and equatorial radius of the millisecond pulsar PSR J0030$+$0451, conditional on pulse-profile modeling of Neutron Star Interior Composition Explorer (NICER) X-ray spectral-timing event data. We perform relativistic ray-tracing of thermal emission from hot regions of the pulsar's surface. We assume two distinct hot regions based on two clear pulsed components in the phase-folded pulse-profile data; we explore a number of forms (morphologies and topologies) for each hot region, inferring their parameters in addition to the stellar mass and radius. For the family of models considered, the evidence (prior predictive probability of the data) strongly favors a model that permits both hot regions to be located in the same rotational hemisphere. Models wherein both hot regions are assumed to be simply-connected circular single-temperature spots, in particular those where the spots are assumed to be reflection-symmetric with respect to the stellar origin, are strongly disfavored. For the inferred configuration, one hot region subtends an angular extent of only a few degrees (in spherical coordinates with origin at the stellar center) and we are insensitive to other structural details; the second hot region is far more azimuthally extended in the form of a narrow arc, thus requiring a larger number of parameters to describe. The inferred mass $M$ and equatorial radius $R_\mathrm{eq}$ are, respectively, $1.34_{-0.16}^{+0.15}$ M$_{\odot}$ and $12.71_{-1.19}^{+1.14}$ km, whilst the compactness $GM/R_\mathrm{eq}c^2 = 0.156_{-0.010}^{+0.008}$ is more tightly constrained; the credible interval bounds reported here are approximately the $16\%$ and $84\%$ quantiles in marginal posterior mass. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Appears in ApJ Letters Focus Issue on NICER Constraints on the Dense Matter Equation of State; 76 pages, 24 figures, 7 tables, 8 figure sets (available in the online journal or from the authors)

Journal ref: ApJL, 887, L21 (2019)

arXiv:1602.03924 [pdf, other]

Modeling Human Ad Hoc Coordination

Authors: Peter M. Krafft, Chris L. Baker, Alex Pentland, Joshua B. Tenenbaum

Abstract: Whether in groups of humans or groups of computer agents, collaboration is most effective between individuals who have the ability to coordinate on a joint strategy for collective action. However, in general a rational actor will only intend to coordinate if that actor believes the other group members have the same intention. This circular dependence makes rational coordination difficult in uncert… ▽ More Whether in groups of humans or groups of computer agents, collaboration is most effective between individuals who have the ability to coordinate on a joint strategy for collective action. However, in general a rational actor will only intend to coordinate if that actor believes the other group members have the same intention. This circular dependence makes rational coordination difficult in uncertain environments if communication between actors is unreliable and no prior agreements have been made. An important normative question with regard to coordination in these ad hoc settings is therefore how one can come to believe that other actors will coordinate, and with regard to systems involving humans, an important empirical question is how humans arrive at these expectations. We introduce an exact algorithm for computing the infinitely recursive hierarchy of graded beliefs required for rational coordination in uncertain environments, and we introduce a novel mechanism for multiagent coordination that uses it. Our algorithm is valid in any environment with a finite state space, and extensions to certain countably infinite state spaces are likely possible. We test our mechanism for multiagent coordination as a model for human decisions in a simple coordination game using existing experimental data. We then explore via simulations whether modeling humans in this way may improve human-agent collaboration. △ Less

Submitted 11 February, 2016; originally announced February 2016.

Comments: AAAI 2016

ACM Class: I.2.0; I.2.11; J.4

arXiv:1512.00964 [pdf, other]

Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model

Authors: Ryo Nakahashi, Chris L. Baker, Joshua B. Tenenbaum

Abstract: Most human behaviors consist of multiple parts, steps, or subtasks. These structures guide our action planning and execution, but when we observe others, the latent structure of their actions is typically unobservable, and must be inferred in order to learn new skills by demonstration, or to assist others in completing their tasks. For example, an assistant who has learned the subgoal structure of… ▽ More Most human behaviors consist of multiple parts, steps, or subtasks. These structures guide our action planning and execution, but when we observe others, the latent structure of their actions is typically unobservable, and must be inferred in order to learn new skills by demonstration, or to assist others in completing their tasks. For example, an assistant who has learned the subgoal structure of a colleague's task can more rapidly recognize and support their actions as they unfold. Here we model how humans infer subgoals from observations of complex action sequences using a nonparametric Bayesian model, which assumes that observed actions are generated by approximately rational planning over unknown subgoal sequences. We test this model with a behavioral experiment in which humans observed different series of goal-directed actions, and inferred both the number and composition of the subgoal sequences associated with each goal. The Bayesian model predicts human subgoal inferences with high accuracy, and significantly better than several alternative models and straightforward heuristics. Motivated by this result, we simulate how learning and inference of subgoals can improve performance in an artificial user assistance task. The Bayesian model learns the correct subgoals from fewer observations, and better assists users by more rapidly and accurately inferring the goal of their actions than alternative approaches. △ Less

Submitted 3 December, 2015; originally announced December 2015.

Comments: Accepted at AAAI 16

Journal ref: Proceedings of 30th conference on artificial intelligence (AAAI 2016) pp. 3754--3760

Showing 1–7 of 7 results for author: Baker, C L