-
Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models
Authors:
Jacob Levy,
Tyler Westenbroek,
David Fridovich-Keil
Abstract:
Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-id…
▽ More
Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-identification, but break down in the face of complex phenomena such as contact. We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems which seamlessly integrates structured first principles modeling techniques with black-box auto-regressive models. Specifically, we develop an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrate these predictions using known Lagrangian dynamics. With this semi-structured approach, we can make accurate long-horizon predictions with substantially less data than prior methods. We leverage this capability and propose Semi-Structured Reinforcement Learning ($\texttt{SSRL}$) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning. We validate our approach on a real-world Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just a few minutes of real-world data. Video and code are available at: https://sites.google.com/utexas.edu/ssrl
△ Less
Submitted 28 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
Authors:
Daniel Pfrommer,
Max Simchowitz,
Tyler Westenbroek,
Nikolai Matni,
Stephen Tu
Abstract:
A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear…
▽ More
A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains sample complexity polynomial in relevant problem parameters, and, by synthesizing locally stabilizing gains, overcomes exponential dependence in problem horizon. Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control
Authors:
Tyler Westenbroek,
Anand Siththaranjan,
Mohsin Sarwari,
Claire J. Tomlin,
Shankar S. Sastry
Abstract:
Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost functio…
▽ More
Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost function interacts with the underlying structure of the control system and impacts the amount of computation required to obtain a stabilizing controller.
We treat the cost design problem as a two-step process where the designer specifies outputs for the system that are to be penalized and then modulates the relative weighting of the inputs and the outputs in the cost. To characterize the computational burden associated to obtaining a stabilizing controller with a particular cost, we bound the prediction horizon required by receding horizon methods and the number of iterations required by dynamic programming methods to meet this requirement. Our theoretical results highlight a qualitative separation between what is possible, from a design perspective, when the chosen outputs induce either minimum-phase or non-minimum-phase behavior. Simulation studies indicate that this separation also holds for modern reinforcement learning methods.
△ Less
Submitted 17 November, 2022; v1 submitted 5 April, 2022;
originally announced April 2022.
-
On the Stability of Nonlinear Receding Horizon Control: A Geometric Perspective
Authors:
Tyler Westenbroek,
Max Simchowitz,
Michael I. Jordan,
S. Shankar Sastry
Abstract:
%!TEX root = LCSS_main_max.tex
The widespread adoption of nonlinear Receding Horizon Control (RHC) strategies by industry has led to more than 30 years of intense research efforts to provide stability guarantees for these methods. However, current theoretical guarantees require that each (generally nonconvex) planning problem can be solved to (approximate) global optimality, which is an unrealis…
▽ More
%!TEX root = LCSS_main_max.tex
The widespread adoption of nonlinear Receding Horizon Control (RHC) strategies by industry has led to more than 30 years of intense research efforts to provide stability guarantees for these methods. However, current theoretical guarantees require that each (generally nonconvex) planning problem can be solved to (approximate) global optimality, which is an unrealistic requirement for the derivative-based local optimization methods generally used in practical implementations of RHC. This paper takes the first step towards understanding stability guarantees for nonlinear RHC when the inner planning problem is solved to first-order stationary points, but not necessarily global optima. Special attention is given to feedback linearizable systems, and a mixture of positive and negative results are provided. We establish that, under certain strong conditions, first-order solutions to RHC exponentially stabilize linearizable systems. Surprisingly, these conditions can hold even in situations where there may be \textit{spurious local minima.} Crucially, this guarantee requires that state costs applied to the planning problems are in a certain sense `compatible' with the global geometry of the system, and a simple counter-example demonstrates the necessity of this condition. These results highlight the need to rethink the role of global geometry in the context of optimization-based control.
△ Less
Submitted 25 January, 2024; v1 submitted 27 March, 2021;
originally announced March 2021.
-
Learning Min-norm Stabilizing Control Laws for Systems with Unknown Dynamics
Authors:
Tyler Westenbroek,
Fernando Castaneda,
Ayush Agrawal,
S. Shankar Sastry,
Koushil Sreenath
Abstract:
This paper introduces a framework for learning a minimum-norm stabilizing controller for a system with unknown dynamics using model-free policy optimization methods. The approach begins by first designing a Control Lyapunov Function (CLF) for a (possibly inaccurate) dynamics model for the system, along with a function which specifies a minimum acceptable rate of energy dissipation for the CLF at d…
▽ More
This paper introduces a framework for learning a minimum-norm stabilizing controller for a system with unknown dynamics using model-free policy optimization methods. The approach begins by first designing a Control Lyapunov Function (CLF) for a (possibly inaccurate) dynamics model for the system, along with a function which specifies a minimum acceptable rate of energy dissipation for the CLF at different points in the state-space. Treating the energy dissipation condition as a constraint on the desired closed-loop behavior of the real-world system, we use penalty methods to formulate an unconstrained optimization problem over the parameters of a learned controller, which can be solved using model-free policy optimization algorithms using data collected from the plant. We discuss when the optimization learns a stabilizing controller for the real world system and derive conditions on the structure of the learned controller which ensure that the optimization is strongly convex, meaning the globally optimal solution can be found reliably. We validate the approach in simulation, first for a double pendulum, and then generalize the framework to learn stable walking controllers for underactuated bipedal robots using the Hybrid Zero Dynamics framework. By encoding a large amount of structure into the learning problem, we are able to learn stabilizing controllers for both systems with only minutes or even seconds of training data.
△ Less
Submitted 1 October, 2020; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning
Authors:
Tyler Westenbroek,
Eric Mazumdar,
David Fridovich-Keil,
Valmik Prabhu,
Claire J. Tomlin,
S. Shankar Sastry
Abstract:
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enab…
▽ More
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system using discrete-time model-free policy-gradient parameter update rules. The primary advantage of the scheme over standard model-reference adaptive control techniques is that it does not require the learned inverse model to be invertible at all instances of time. This enables the use of general function approximators to approximate the linearizing controller for the system without having to worry about singularities. However, the discrete-time and stochastic nature of these algorithms precludes the direct application of standard machinery from the adaptive control literature to provide deterministic stability proofs for the system. Nevertheless, we leverage these techniques alongside tools from the stochastic approximation literature to demonstrate that with high probability the tracking and parameter errors concentrate near zero when a certain persistence of excitation condition is satisfied. A simulated example of a double pendulum demonstrates the utility of the proposed theory. 1
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Feedback Linearization for Unknown Systems via Reinforcement Learning
Authors:
Tyler Westenbroek,
David Fridovich-Keil,
Eric Mazumdar,
Shreyas Arora,
Valmik Prabhu,
S. Shankar Sastry,
Claire J. Tomlin
Abstract:
We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Onc…
▽ More
We present a novel approach to control design for nonlinear systems which leverages model-free policy optimization techniques to learn a linearizing controller for a physical plant with unknown dynamics. Feedback linearization is a technique from nonlinear control which renders the input-output dynamics of a nonlinear plant \emph{linear} under application of an appropriate feedback controller. Once a linearizing controller has been constructed, desired output trajectories for the nonlinear plant can be tracked using a variety of linear control techniques. However, the calculation of a linearizing controller requires a precise dynamics model for the system. As a result, model-based approaches for learning exact linearizing controllers generally require a simple, highly structured model of the system with easily identifiable parameters. In contrast, the model-free approach presented in this paper is able to approximate the linearizing controller for the plant using general function approximation architectures. Specifically, we formulate a continuous-time optimization problem over the parameters of a learned linearizing controller whose optima are the set of parameters which best linearize the plant. We derive conditions under which the learning problem is (strongly) convex and provide guarantees which ensure the true linearizing controller for the plant is recovered. We then discuss how model-free policy optimization algorithms can be used to solve a discrete-time approximation to the problem using data collected from the real-world plant. The utility of the framework is demonstrated in simulation and on a real-world robotic platform.
△ Less
Submitted 21 April, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Technical Report: Optimal Control of Piecwise-smooth Control Systems via Singular Perturbations
Authors:
Tyler Westenbroek,
Xiaobin Xiong,
Aaron D Ames,
S Shankar Sastry
Abstract:
This paper investigates optimal control problems formulated over a class of piecewise-smooth vector fields. Instead of optimizing over the discontinuous system directly, we instead formulate optimal control problems over a family of regularizations which are obtained by "smoothing out" the discontinuity in the original system. It is shown that the smooth problems can be used to obtain accurate der…
▽ More
This paper investigates optimal control problems formulated over a class of piecewise-smooth vector fields. Instead of optimizing over the discontinuous system directly, we instead formulate optimal control problems over a family of regularizations which are obtained by "smoothing out" the discontinuity in the original system. It is shown that the smooth problems can be used to obtain accurate derivative information about the non-smooth problem, under standard regularity conditions. We then indicate how the regularizations can be used to consistently approximate the non-smooth optimal control problem in the sense of Polak. The utility of these smoothing techniques is demonstrated in an in-depth example, where we utilize recently developed reduced-order modeling techniques from the dynamic walking community to generate motion plans across contact sequences for a 18-DOF model of a lower-body exoskeleton.
△ Less
Submitted 31 March, 2019; v1 submitted 28 March, 2019;
originally announced March 2019.
-
A New Solution Concept and Family of Relaxations for Hybrid Dynamical Systems
Authors:
Tyler Westenbroek,
Humberto Gonzalez,
S. Shankar Sastry
Abstract:
We introduce a holistic framework for the analysis, approximation and control of the trajectories of hybrid dynamical systems which display event-triggered discrete jumps in the continuous state. We begin by demonstrating how to explicitly represent the dynamics of this class of systems using a single piecewise-smooth vector field defined on a manifold, and then employ Filippov's solution concept…
▽ More
We introduce a holistic framework for the analysis, approximation and control of the trajectories of hybrid dynamical systems which display event-triggered discrete jumps in the continuous state. We begin by demonstrating how to explicitly represent the dynamics of this class of systems using a single piecewise-smooth vector field defined on a manifold, and then employ Filippov's solution concept to describe the trajectories of the system. The resulting \emph{hybrid Filippov solutions} greatly simplify the mathematical description of hybrid executions, providing a unifying solution concept with which to work. Extending previous efforts to regularize piecewise-smooth vector fields, we then introduce a parameterized family of smooth control systems whose trajectories are used to approximate the hybrid Filippov solution numerically. The two solution concepts are shown to agree in the limit, under mild regularity conditions.
△ Less
Submitted 14 December, 2018; v1 submitted 21 March, 2018;
originally announced March 2018.
-
On the Relaxation of Hybrid Dynamical Systems
Authors:
Tyler Westenbroek,
S. Shankar Sastry,
Humberto Gonzalez
Abstract:
Hybrid dynamical systems have proven to be a powerful modeling abstraction, yet fundamental questions regarding the dynamical properties of these systems remain. In this paper, we develop a novel class of relaxations which we use to recover a number of classic systems theoretic properties for hybrid systems, such as existence and uniqueness of trajectories, even past the point of Zeno. Our relaxat…
▽ More
Hybrid dynamical systems have proven to be a powerful modeling abstraction, yet fundamental questions regarding the dynamical properties of these systems remain. In this paper, we develop a novel class of relaxations which we use to recover a number of classic systems theoretic properties for hybrid systems, such as existence and uniqueness of trajectories, even past the point of Zeno. Our relaxations also naturally give rise to a class of provably convergent numerical approximations, capable of simulating through Zeno. Using our methods, we are also able to perform sensitivity analysis about nominal trajectories undergoing a discrete transition -- a technique with many practical applications, such as assessing the stability of periodic orbits.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
Optimal Control of Hybrid Systems Using a Feedback Relaxed Control Formulation
Authors:
Tyler Westenbroek,
Humberto Gonzalez
Abstract:
We present a numerically tractable formulation for computing the optimal control of the class of hybrid dynamical systems whose trajectories are continuous. Our formulation, an extension of existing relaxed-control techniques for switched dynamical systems, incorporates the domain information of each discrete mode as part of the constraints in the optimization problem. Moreover, our numerical resu…
▽ More
We present a numerically tractable formulation for computing the optimal control of the class of hybrid dynamical systems whose trajectories are continuous. Our formulation, an extension of existing relaxed-control techniques for switched dynamical systems, incorporates the domain information of each discrete mode as part of the constraints in the optimization problem. Moreover, our numerical results are consistent with phenomena that are particular to hybrid systems, such as the creation of sliding trajectories between discrete modes.
△ Less
Submitted 25 May, 2016; v1 submitted 30 October, 2015;
originally announced October 2015.