Search | arXiv e-print repository

Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations

Authors: Sarmad Mehrdad, Avadesh Meduri, Ludovic Righetti

Abstract: We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a method to find an appropriate step size that ensures learned cost function features remain similar to the demonstrated trajectory features. In contrast to similar a… ▽ More We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a method to find an appropriate step size that ensures learned cost function features remain similar to the demonstrated trajectory features. In contrast to similar approaches, our algorithm can individually tune the effectiveness of each observation for the partition function and does not need a large sample set, enabling faster learning. We generate sample trajectories by solving an optimal control problem instead of random sampling, leading to more informative trajectories. The performance of our method is compared to two state of the art algorithms to demonstrate its benefits in several simulated environments. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2403.03639 [pdf, ps, other]

Diffusion-based learning of contact plans for agile locomotion

Authors: Victor Dhédin, Adithya Kumar Chinnakkonda Ravi, Armand Jordana, Huaijiang Zhu, Avadesh Meduri, Ludovic Righetti, Bernhard Schölkopf, Majid Khadiv

Abstract: Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as stepping stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on stepping stones. In our framework, we use nonlinear… ▽ More Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as stepping stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on stepping stones. In our framework, we use nonlinear model predictive control (NMPC) to generate whole-body motions for a given contact plan. To efficiently search for an optimal contact plan, we propose to use Monte Carlo tree search (MCTS). While the combination of MCTS and NMPC can quickly find a feasible plan for a given environment (a few seconds), it is not yet suitable to be used as a reactive policy. Hence, we generate a dataset for optimal goal-conditioned policy for a given scene and learn it through supervised learning. In particular, we leverage the power of diffusion models in handling multi-modality in the dataset. We test our proposed framework on a scenario where our quadruped robot Solo12 successfully jumps to different goals in a highly constrained environment. △ Less

Submitted 23 June, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2305.11573 [pdf, other]

Risk-Sensitive Extended Kalman Filter

Authors: Armand Jordana, Avadesh Meduri, Etienne Arlaud, Justin Carpentier, Ludovic Righetti

Abstract: In robotics, designing robust algorithms in the face of estimation uncertainty is a challenging task. Indeed, controllers often do not consider the estimation uncertainty and only rely on the most likely estimated state. Consequently, sudden changes in the environment or the robot's dynamics can lead to catastrophic behaviors. In this work, we present a risk-sensitive Extended Kalman Filter that a… ▽ More In robotics, designing robust algorithms in the face of estimation uncertainty is a challenging task. Indeed, controllers often do not consider the estimation uncertainty and only rely on the most likely estimated state. Consequently, sudden changes in the environment or the robot's dynamics can lead to catastrophic behaviors. In this work, we present a risk-sensitive Extended Kalman Filter that allows doing output-feedback Model Predictive Control (MPC) safely. This filter adapts its estimation to the control objective. By taking a pessimistic estimate concerning the value function resulting from the MPC controller, the filter provides increased robustness to the controller in phases of uncertainty as compared to a standard Extended Kalman Filter (EKF). Moreover, the filter has the same complexity as an EKF, so that it can be used for real-time model-predictive control. The paper evaluates the risk-sensitive behavior of the proposed filter when used in a nonlinear model-predictive control loop on a planar drone and industrial manipulator in simulation, as well as on an external force estimation task on a real quadruped robot. These experiments demonstrate the abilities of the approach to improve performance in the face of uncertainties significantly. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2210.02127 [pdf, other]

Visual-Inertial and Leg Odometry Fusion for Dynamic Locomotion

Authors: Victor Dhédin, Haolong Li, Shahram Khorshidi, Lukas Mack, Adithya Kumar Chinnakkonda Ravi, Avadesh Meduri, Paarth Shah, Felix Grimminger, Ludovic Righetti, Majid Khadiv, Joerg Stueckler

Abstract: Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman… ▽ More Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman filter (EKF) based state estimator. The VIO module uses a stereo camera and IMU to yield low-drift 3D position and yaw orientation and drift-free pitch and roll orientation of the robot base link in the inertial frame. However, these values have a considerable amount of latency due to image processing and optimization, while the rate of update is quite low which is not suitable for low-level control. To reduce the latency, we predict the VIO state estimate at the rate of the IMU measurements of the VIO sensor. The EKF module uses the base pose and linear velocity predicted by VIO, fuses them further with a second high-rate IMU and leg odometry measurements, and produces robot state estimates with a high frequency and small latency suitable for control. We integrate this lightweight estimation framework with a nonlinear model predictive controller and show successful implementation of a set of agile locomotion behaviors, including trotting and jumping at varying horizontal speeds, on a torque-controlled quadruped robot. △ Less

Submitted 10 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA), 2023

arXiv:2209.15566 [pdf, other]

doi 10.1109/UR61395.2024.10597477

ContactNet: Online Multi-Contact Planning for Acyclic Legged Robot Locomotion

Authors: Angelo Bratta, Avadesh Meduri, Michele Focchi, Ludovic Righetti, Claudio Semini

Abstract: In legged logomotion, online trajectory optimization techniques generally depend on heuristic-based contact planners in order to have low computation times and achieve high replanning frequencies. In this work, we propose ContactNet, a fast acyclic contact planner based on a multi-output regression neural network. ContactNet ranks discretized stepping regions, allowing to quickly choose the best f… ▽ More In legged logomotion, online trajectory optimization techniques generally depend on heuristic-based contact planners in order to have low computation times and achieve high replanning frequencies. In this work, we propose ContactNet, a fast acyclic contact planner based on a multi-output regression neural network. ContactNet ranks discretized stepping regions, allowing to quickly choose the best feasible solution, even in complex environments. The low computation time, in the order of 1 ms, makes possible the execution of the contact planner concurrently with a trajectory optimizer in a Model Predictive Control (MPC) fashion. We demonstrate the effectiveness of the approach in simulation in different complex scenarios with the quadruped robot Solo12. △ Less

Submitted 2 May, 2024; v1 submitted 30 September, 2022; originally announced September 2022.

Journal ref: 21st International Conference on Ubiquitous Robots (UR) 2024

arXiv:2209.09451 [pdf, other]

MPC with Sensor-Based Online Cost Adaptation

Authors: Avadesh Meduri, Huaijiang Zhu, Armand Jordana, Ludovic Righetti

Abstract: Model predictive control is a powerful tool to generate complex motions for robots. However, it often requires solving non-convex problems online to produce rich behaviors, which is computationally expensive and not always practical in real time. Additionally, direct integration of high dimensional sensor data (e.g. RGB-D images) in the feedback loop is challenging with current state-space methods… ▽ More Model predictive control is a powerful tool to generate complex motions for robots. However, it often requires solving non-convex problems online to produce rich behaviors, which is computationally expensive and not always practical in real time. Additionally, direct integration of high dimensional sensor data (e.g. RGB-D images) in the feedback loop is challenging with current state-space methods. This paper aims to address both issues. It introduces a model predictive control scheme, where a neural network constantly updates the cost function of a quadratic program based on sensory inputs, aiming to minimize a general non-convex task loss without solving a non-convex problem online. By updating the cost, the robot is able to adapt to changes in the environment directly from sensor measurement without requiring a new cost design. Furthermore, since the quadratic program can be solved efficiently with hard constraints, a safe deployment on the robot is ensured. Experiments with a wide variety of reaching tasks on an industrial robot manipulator demonstrate that our method can efficiently solve complex non-convex problems with high-dimensional visual sensory inputs, while still being robust to external disturbances. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 6 Pages, 5 Figures

arXiv:2206.09023 [pdf, other]

Efficient Object Manipulation Planning with Monte Carlo Tree Search

Authors: Huaijiang Zhu, Avadesh Meduri, Ludovic Righetti

Abstract: This paper presents an efficient approach to object manipulation planning using Monte Carlo Tree Search (MCTS) to find contact sequences and an efficient ADMM-based trajectory optimization algorithm to evaluate the dynamic feasibility of candidate contact sequences. To accelerate MCTS, we propose a methodology to learn a goal-conditioned policy-value network to direct the search towards promising… ▽ More This paper presents an efficient approach to object manipulation planning using Monte Carlo Tree Search (MCTS) to find contact sequences and an efficient ADMM-based trajectory optimization algorithm to evaluate the dynamic feasibility of candidate contact sequences. To accelerate MCTS, we propose a methodology to learn a goal-conditioned policy-value network to direct the search towards promising nodes. Further, manipulation-specific heuristics enable to drastically reduce the search space. Systematic object manipulation experiments in a physics simulator and on real hardware demonstrate the efficiency of our approach. In particular, our approach scales favorably for long manipulation sequences thanks to the learned policy-value network, significantly improving planning success rate. △ Less

Submitted 19 March, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2201.07601 [pdf, other]

BiConMP: A Nonlinear Model Predictive Control Framework for Whole Body Motion Planning

Authors: Avadesh Meduri, Paarth Shah, Julian Viereck, Majid Khadiv, Ioannis Havoutis, Ludovic Righetti

Abstract: Online planning of whole-body motions for legged robots is challenging due to the inherent nonlinearity in the robot dynamics. In this work, we propose a nonlinear MPC framework, the BiConMP which can generate whole body trajectories online by efficiently exploiting the structure of the robot dynamics. BiConMP is used to generate various cyclic gaits on a real quadruped robot and its performance i… ▽ More Online planning of whole-body motions for legged robots is challenging due to the inherent nonlinearity in the robot dynamics. In this work, we propose a nonlinear MPC framework, the BiConMP which can generate whole body trajectories online by efficiently exploiting the structure of the robot dynamics. BiConMP is used to generate various cyclic gaits on a real quadruped robot and its performance is evaluated on different terrain, countering unforeseen pushes and transitioning online between different gaits. Further, the ability of BiConMP to generate non-trivial acyclic whole-body dynamic motions on the robot is presented. The same approach is also used to generate various dynamic motions in MPC on a humanoid robot (Talos) and another quadruped robot (AnYmal) in simulation. Finally, an extensive empirical analysis on the effects of planning horizon and frequency on the nonlinear MPC framework is reported and discussed. △ Less

Submitted 15 September, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.04090 [pdf, other]

ValueNetQP: Learned one-step optimal control for legged locomotion

Authors: Julian Viereck, Avadesh Meduri, Ludovic Righetti

Abstract: Optimal control is a successful approach to generate motions for complex robots, in particular for legged locomotion. However, these techniques are often too slow to run in real time for model predictive control or one needs to drastically simplify the dynamics model. In this work, we present a method to learn to predict the gradient and hessian of the problem value function, enabling fast resolut… ▽ More Optimal control is a successful approach to generate motions for complex robots, in particular for legged locomotion. However, these techniques are often too slow to run in real time for model predictive control or one needs to drastically simplify the dynamics model. In this work, we present a method to learn to predict the gradient and hessian of the problem value function, enabling fast resolution of the predictive control problem with a one-step quadratic program. In addition, our method is able to satisfy constraints like friction cones and unilateral constraints, which are important for high dynamics locomotion tasks. We demonstrate the capability of our method in simulation and on a real quadruped robot performing trotting and bounding motions. △ Less

Submitted 11 January, 2022; originally announced January 2022.

arXiv:2108.01797 [pdf, other]

Rapid Convex Optimization of Centroidal Dynamics using Block Coordinate Descent

Authors: Paarth Shah, Avadesh Meduri, Wolfgang Merkt, Majid Khadiv, Ioannis Havoutis, Ludovic Righetti

Abstract: In this paper we explore the use of block coordinate descent (BCD) to optimize the centroidal momentum dynamics for dynamically consistent multi-contact behaviors. The centroidal dynamics have recently received a large amount of attention in order to create physically realizable motions for robots with hands and feet while being computationally more tractable than full rigid body dynamics models.… ▽ More In this paper we explore the use of block coordinate descent (BCD) to optimize the centroidal momentum dynamics for dynamically consistent multi-contact behaviors. The centroidal dynamics have recently received a large amount of attention in order to create physically realizable motions for robots with hands and feet while being computationally more tractable than full rigid body dynamics models. Our contribution lies in exploiting the structure of the dynamics in order to simplify the original non-convex problem into two convex subproblems. We iterate between these two subproblems for a set number of iterations or until a consensus is reached. We explore the properties of the proposed optimization method for the centroidal dynamics and verify in simulation that motions generated by our approach can be tracked by the quadruped Solo12. In addition, we compare our method to a recently proposed convexification using a sequence of convex relaxations as well as a more standard interior point method used in the off- the-shelf solver IPOPT to show that our approach finds similar, if not better, trajectories (in terms of cost), and is more than four times faster than both approaches. Finally, compared to previous approaches, we note its practicality due to the convex nature of each subproblem which allows our method to be used with any off-the-shelf quadratic programming solver. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2010.14834 [pdf, other]

DeepQ Stepper: A framework for reactive dynamic walking on uneven terrain

Authors: Avadesh Meduri, Majid Khadiv, Ludovic Righetti

Abstract: Reactive stepping and push recovery for biped robots is often restricted to flat terrains because of the difficulty in computing capture regions for nonlinear dynamic models. In this paper, we address this limitation by using reinforcement learning to approximately learn the 3D capture region for such systems. We propose a novel 3D reactive stepper, The DeepQ stepper, that computes optimal step lo… ▽ More Reactive stepping and push recovery for biped robots is often restricted to flat terrains because of the difficulty in computing capture regions for nonlinear dynamic models. In this paper, we address this limitation by using reinforcement learning to approximately learn the 3D capture region for such systems. We propose a novel 3D reactive stepper, The DeepQ stepper, that computes optimal step locations for walking at different velocities using the 3D capture regions approximated by the action-value function. We demonstrate the ability of the approach to learn stepping with a simplified 3D pendulum model and a full robot dynamics. Further, the stepper achieves a higher performance when it learns approximate capture regions while taking into account the entire dynamics of the robot that are often ignored in existing reactive steppers based on simplified models. The DeepQ stepper can handle non convex terrain with obstacles, walk on restricted surfaces like stepping stones and recover from external disturbances for a constant computational cost. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.01215 [pdf, other]

doi 10.1109/TRO.2020.3048125

Efficient Multi-Contact Pattern Generation with Sequential Convex Approximations of the Centroidal Dynamics

Authors: Brahayam Ponton, Majid Khadiv, Avadesh Meduri, Ludovic Righetti

Abstract: This paper investigates the problem of efficient computation of physically consistent multi-contact behaviors. Recent work showed that under mild assumptions, the problem could be decomposed into simpler kinematic and centroidal dynamic optimization problems. Based on this approach, we propose a general convex relaxation of the centroidal dynamics leading to two computationally efficient algorithm… ▽ More This paper investigates the problem of efficient computation of physically consistent multi-contact behaviors. Recent work showed that under mild assumptions, the problem could be decomposed into simpler kinematic and centroidal dynamic optimization problems. Based on this approach, we propose a general convex relaxation of the centroidal dynamics leading to two computationally efficient algorithms based on iterative resolutions of second order cone programs. They optimize centroidal trajectories, contact forces and, importantly, the timing of the motions. We include the approach in a kino-dynamic optimization method to generate full-body movements. Finally, the approach is embedded in a mixed-integer solver to further find dynamically consistent contact sequences. Extensive numerical experiments demonstrate the computational efficiency of the approach, suggesting that it could be used in a fast receding horizon control loop. Executions of the planned motions on simulated humanoids and quadrupeds and on a real quadruped robot further show the quality of the optimized motions. △ Less

Submitted 2 October, 2020; originally announced October 2020.

arXiv:1910.00093 [pdf, other]

doi 10.1109/LRA.2020.2976639

An Open Torque-Controlled Modular Robot Architecture for Legged Locomotion Research

Authors: Felix Grimminger, Avadesh Meduri, Majid Khadiv, Julian Viereck, Manuel Wüthrich, Maximilien Naveau, Vincent Berenz, Steve Heim, Felix Widmaier, Thomas Flayols, Jonathan Fiene, Alexander Badri-Spröwitz, Ludovic Righetti

Abstract: We present a new open-source torque-controlled legged robot system, with a low-cost and low-complexity actuator module at its core. It consists of a high-torque brushless DC motor and a low-gear-ratio transmission suitable for impedance and force control. We also present a novel foot contact sensor suitable for legged locomotion with hard impacts. A 2.2 kg quadruped robot with a large range of mot… ▽ More We present a new open-source torque-controlled legged robot system, with a low-cost and low-complexity actuator module at its core. It consists of a high-torque brushless DC motor and a low-gear-ratio transmission suitable for impedance and force control. We also present a novel foot contact sensor suitable for legged locomotion with hard impacts. A 2.2 kg quadruped robot with a large range of motion is assembled from eight identical actuator modules and four lower legs with foot contact sensors. Leveraging standard plastic 3D printing and off-the-shelf parts results in a lightweight and inexpensive robot, allowing for rapid distribution and duplication within the research community. We systematically characterize the achieved impedance at the foot in both static and dynamic scenarios, and measure a maximum dimensionless leg stiffness of 10.8 without active damping, which is comparable to the leg stiffness of a running human. Finally, to demonstrate the capabilities of the quadruped, we present a novel controller which combines feedforward contact forces computed from a kino-dynamic optimizer with impedance control of the center of mass and base orientation. The controller can regulate complex motions while being robust to environmental uncertainty. △ Less

Submitted 23 February, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

Showing 1–13 of 13 results for author: Meduri, A