-
Is Bellman Equation Enough for Learning Control?
Authors:
Haoxiang You,
Lekan Molu,
Ian Abraham
Abstract:
The Bellman equation and its continuous-time counterpart, the Hamilton-Jacobi-Bellman (HJB) equation, serve as necessary conditions for optimality in reinforcement learning and optimal control. While the value function is known to be the unique solution to the Bellman equation in tabular settings, we demonstrate that this uniqueness fails to hold in continuous state spaces. Specifically, for linea…
▽ More
The Bellman equation and its continuous-time counterpart, the Hamilton-Jacobi-Bellman (HJB) equation, serve as necessary conditions for optimality in reinforcement learning and optimal control. While the value function is known to be the unique solution to the Bellman equation in tabular settings, we demonstrate that this uniqueness fails to hold in continuous state spaces. Specifically, for linear dynamical systems, we prove the Bellman equation admits at least $\binom{2n}{n}$ solutions, where $n$ is the state dimension. Crucially, only one of these solutions yields both an optimal policy and a stable closed-loop system. We then demonstrate a common failure mode in value-based methods: convergence to unstable solutions due to the exponential imbalance between admissible and inadmissible solutions. Finally, we introduce a positive-definite neural architecture that guarantees convergence to the stable solution by construction to address this issue.
△ Less
Submitted 5 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
The Python LevelSet Toolbox (LevelSetPy)
Authors:
Lekan Molu
Abstract:
This paper describes open-source scientific contributions in python surrounding the numerical solutions to hyperbolic Hamilton-Jacobi (HJ) partial differential equations viz., their implicit representation on co-dimension one surfaces; dynamics evolution with levelsets; spatial derivatives; total variation diminishing Runge-Kutta integration schemes; and their applications to the theory of reachab…
▽ More
This paper describes open-source scientific contributions in python surrounding the numerical solutions to hyperbolic Hamilton-Jacobi (HJ) partial differential equations viz., their implicit representation on co-dimension one surfaces; dynamics evolution with levelsets; spatial derivatives; total variation diminishing Runge-Kutta integration schemes; and their applications to the theory of reachable sets. They are increasingly finding applications in multiple research domains such as reinforcement learning, robotics, control engineering and automation. We describe the library components, illustrate usage with an example, and provide comparisons with existing implementations. This GPU-accelerated package allows for easy portability to many modern libraries for the numerical analyses of the HJ equations. We also provide a CPU implementation in python that is significantly faster than existing alternatives.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees
Authors:
Shaoru Chen,
Lekan Molu,
Mahyar Fazlyab
Abstract:
Barrier functions are a general framework for establishing a safety guarantee for a system. However, there is no general method for finding these functions. To address this shortcoming, recent approaches use self-supervised learning techniques to learn these functions using training data that are periodically generated by a verification procedure, leading to a verification-aided learning framework…
▽ More
Barrier functions are a general framework for establishing a safety guarantee for a system. However, there is no general method for finding these functions. To address this shortcoming, recent approaches use self-supervised learning techniques to learn these functions using training data that are periodically generated by a verification procedure, leading to a verification-aided learning framework. Despite its immense potential in automating barrier function synthesis, the verification-aided learning framework does not have termination guarantees and may suffer from a low success rate of finding a valid barrier function in practice. In this paper, we propose a holistic approach to address these drawbacks. With a convex formulation of the barrier function synthesis, we propose to first learn an empirically well-behaved NN basis function and then apply a fine-tuning algorithm that exploits the convexity and counterexamples from the verification failure to find a valid barrier function with finite-step termination guarantees: if there exist valid barrier functions, the fine-tuning algorithm is guaranteed to find one in a finite number of iterations. We demonstrate that our fine-tuning method can significantly boost the performance of the verification-aided learning framework on examples of different scales and using various neural network verifiers.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Fast Whole-Body Strain Regulation in Continuum Robots
Authors:
Lekan Molu
Abstract:
We propose reaching steps towards the real-time strain control of multiphysics, multiscale continuum soft robots. To study this problem fundamentally, we ground ourselves in a model-based control setting enabled by mathematically precise dynamics of a soft robot prototype. Poised to integrate, rather than reject, inherent mechanical nonlinearities for embodied compliance, we first separate the ori…
▽ More
We propose reaching steps towards the real-time strain control of multiphysics, multiscale continuum soft robots. To study this problem fundamentally, we ground ourselves in a model-based control setting enabled by mathematically precise dynamics of a soft robot prototype. Poised to integrate, rather than reject, inherent mechanical nonlinearities for embodied compliance, we first separate the original robot dynamics into separate subdynamics -- aided by a perturbing time-scale separation parameter. Second, we prescribe a set of stabilizing nonlinear backstepping controllers for regulating the resulting subsystems' strain dynamics. Third, we study the interconnected singularly perturbed system by analyzing and establishing its stability. Fourth, our theories are backed up by fast numerical results on a single arm of the Octopus robot arm. We demonstrate strain regulation to equilibrium, in a significantly reduced time, of the whole-body reduced-order dynamics of an infinite degrees-of-freedom soft robot. This paper communicates our thinking within the backdrop of embodied intelligence: it informs our conceptualization, formulation, computational setup, and yields improved control performance for infinite degrees-of-freedom soft robots.
△ Less
Submitted 8 May, 2025; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Lagrangian Properties and Control of Soft Robots Modeled with Discrete Cosserat Rods
Authors:
Lekan Molu,
Shaoru Chen,
Audrey Sedal
Abstract:
The characteristic ``in-plane" bending associated with soft robots' deformation make them preferred over rigid robots in sophisticated manipulation and movement tasks. Executing such motion strategies to precision in soft deformable robots and structures is however fraught with modeling and control challenges given their infinite degrees-of-freedom. Imposing \textit{piecewise constant strains} (PC…
▽ More
The characteristic ``in-plane" bending associated with soft robots' deformation make them preferred over rigid robots in sophisticated manipulation and movement tasks. Executing such motion strategies to precision in soft deformable robots and structures is however fraught with modeling and control challenges given their infinite degrees-of-freedom. Imposing \textit{piecewise constant strains} (PCS) across (discretized) Cosserat microsolids on the continuum material however, their dynamics become amenable to tractable mathematical analysis. While this PCS model handles the characteristic difficult-to-model ``in-plane" bending well, its Lagrangian properties are not exploited for control in literature neither is there a rigorous study on the dynamic performance of multisection deformable materials for ``in-plane" bending that guarantees steady-state convergence. In this sentiment, we first establish the PCS model's structural Lagrangian properties. Second, we exploit these for control on various strain goal states. Third, we benchmark our hypotheses against an Octopus-inspired robot arm under different constant tip loads. These induce non-constant ``in-plane" deformation and we regulate strain states throughout the continuum in these configurations. Our numerical results establish convergence to desired equilibrium throughout the continuum in all of our tests. Within the bounds here set, we conjecture that our methods can find wide adoption in the control of cable- and fluid-driven multisection soft robotic arms; and may be extensible to the (learning-based) control of deformable agents employed in simulated, mixed, or augmented reality.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
PcLast: Discovering Plannable Continuous Latent States
Authors:
Anurag Koul,
Shivakanth Sujit,
Shaoru Chen,
Ben Evans,
Lili Wu,
Byron Xu,
Rajan Chari,
Riashat Islam,
Raihan Seraj,
Yonathan Efroni,
Lekan Molu,
Miro Dudik,
John Langford,
Alex Lamb
Abstract:
Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effe…
▽ More
Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effective planning and goal-conditioned policy learning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information), and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based settings show significant improvements in sampling efficiency. Further, in reward-free settings this approach yields layered state abstractions that enable computationally efficient hierarchical planning for reaching ad hoc goals with zero additional samples.
△ Less
Submitted 10 June, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models
Authors:
Alex Lamb,
Riashat Islam,
Yonathan Efroni,
Aniket Didolkar,
Dipendra Misra,
Dylan Foster,
Lekan Molu,
Rajan Chari,
Akshay Krishnamurthy,
John Langford
Abstract:
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex…
▽ More
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.
△ Less
Submitted 27 December, 2022; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Interaction-Grounded Learning with Action-inclusive Feedback
Authors:
Tengyang Xie,
Akanksha Saran,
Dylan J. Foster,
Lekan Molu,
Ida Momennejad,
Nan Jiang,
Paul Mineiro,
John Langford
Abstract:
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches f…
▽ More
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.
△ Less
Submitted 12 October, 2022; v1 submitted 16 June, 2022;
originally announced June 2022.