-
SATA: Safe and Adaptive Torque-Based Locomotion Policies Inspired by Animal Learning
Authors:
Peizhuo Li,
Hongyi Li,
Ge Sun,
Jin Cheng,
Xinrong Yang,
Guillaume Bellegarda,
Milad Shafiee,
Yuhong Cao,
Auke Ijspeert,
Guillaume Sartoretti
Abstract:
Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achiev…
▽ More
Despite recent advances in learning-based controllers for legged robots, deployments in human-centric environments remain limited by safety concerns. Most of these approaches use position-based control, where policies output target joint angles that must be processed by a low-level controller (e.g., PD or impedance controllers) to compute joint torques. Although impressive results have been achieved in controlled real-world scenarios, these methods often struggle with compliance and adaptability when encountering environments or disturbances unseen during training, potentially resulting in extreme or unsafe behaviors. Inspired by how animals achieve smooth and adaptive movements by controlling muscle extension and contraction, torque-based policies offer a promising alternative by enabling precise and direct control of the actuators in torque space. In principle, this approach facilitates more effective interactions with the environment, resulting in safer and more adaptable behaviors. However, challenges such as a highly nonlinear state space and inefficient exploration during training have hindered their broader adoption. To address these limitations, we propose SATA, a bio-inspired framework that mimics key biomechanical principles and adaptive learning mechanisms observed in animal locomotion. Our approach effectively addresses the inherent challenges of learning torque-based policies by significantly improving early-stage exploration, leading to high-performance final policies. Remarkably, our method achieves zero-shot sim-to-real transfer. Our experimental results indicate that SATA demonstrates remarkable compliance and safety, even in challenging environments such as soft/slippery terrain or narrow passages, and under significant external disturbances, highlighting its potential for practical deployments in human-centric and safety-critical scenarios.
△ Less
Submitted 8 May, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
AllGaits: Learning All Quadruped Gaits and Transitions
Authors:
Guillaume Bellegarda,
Milad Shafiee,
Auke Ijspeert
Abstract:
We present a framework for learning a single policy capable of producing all quadruped gaits and transitions. The framework consists of a policy trained with deep reinforcement learning (DRL) to modulate the parameters of a system of abstract oscillators (i.e. Central Pattern Generator), whose output is mapped to joint commands through a pattern formation layer that sets the gait style, i.e. body…
▽ More
We present a framework for learning a single policy capable of producing all quadruped gaits and transitions. The framework consists of a policy trained with deep reinforcement learning (DRL) to modulate the parameters of a system of abstract oscillators (i.e. Central Pattern Generator), whose output is mapped to joint commands through a pattern formation layer that sets the gait style, i.e. body height, swing foot ground clearance height, and foot offset. Different gaits are formed by changing the coupling between different oscillators, which can be instantaneously selected at any velocity by a user. With this framework, we systematically investigate which gait should be used at which velocity, and when gait transitions should occur from a Cost of Transport (COT), i.e. energy-efficiency, point of view. Additionally, we note how gait style changes as a function of locomotion speed for each gait to keep the most energy-efficient locomotion. While the currently most popular gait (trot) does not result in the lowest COT, we find that considering different co-dependent metrics such as mean base velocity and joint acceleration result in different `optimal' gaits than those that minimize COT. We deploy our controller in various hardware experiments, showing all 9 typical quadruped animal gaits, and demonstrate generalizability to unseen gaits during training, and robustness to leg failures. Video results can be found at https://youtu.be/OLoWSX_R868.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Online Optimization of Central Pattern Generators for Quadruped Locomotion
Authors:
Zewei Zhang,
Guillaume Bellegarda,
Milad Shafiee,
Auke Ijspeert
Abstract:
Typical legged locomotion controllers are designed or trained offline. This is in contrast to many animals, which are able to locomote at birth, and rapidly improve their locomotion skills with few real-world interactions. Such motor control is possible through oscillatory neural networks located in the spinal cord of vertebrates, known as Central Pattern Generators (CPGs). Models of the CPG have…
▽ More
Typical legged locomotion controllers are designed or trained offline. This is in contrast to many animals, which are able to locomote at birth, and rapidly improve their locomotion skills with few real-world interactions. Such motor control is possible through oscillatory neural networks located in the spinal cord of vertebrates, known as Central Pattern Generators (CPGs). Models of the CPG have been widely used to generate locomotion skills in robotics, but can require extensive hand-tuning or offline optimization of inter-connected parameters with genetic algorithms. In this paper, we present a framework for the \textit{online} optimization of the CPG parameters through Bayesian Optimization. We show that our framework can rapidly optimize and adapt to varying velocity commands and changes in the terrain, for example to varying coefficients of friction, terrain slope angles, and added mass payloads placed on the robot. We study the effects of sensory feedback on the CPG, and find that both force feedback in the phase equations, as well as posture control (Virtual Model Control) are both beneficial for robot stability and energy efficiency. In hardware experiments on the Unitree Go1, we show rapid optimization (in under 3 minutes) and adaptation of energy-efficient gaits to varying target velocities in a variety of scenarios: varying coefficients of friction, added payloads up to 15 kg, and variable slopes up to 10 degrees. See demo at: https://youtu.be/4qq5leCI2AI
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Dynamic Object Catching with Quadruped Robot Front Legs
Authors:
André Schakkal,
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
This paper presents a framework for dynamic object catching using a quadruped robot's front legs while it stands on its rear legs. The system integrates computer vision, trajectory prediction, and leg control to enable the quadruped to visually detect, track, and successfully catch a thrown object using an onboard camera. Leveraging a fine-tuned YOLOv8 model for object detection and a regression-b…
▽ More
This paper presents a framework for dynamic object catching using a quadruped robot's front legs while it stands on its rear legs. The system integrates computer vision, trajectory prediction, and leg control to enable the quadruped to visually detect, track, and successfully catch a thrown object using an onboard camera. Leveraging a fine-tuned YOLOv8 model for object detection and a regression-based trajectory prediction module, the quadruped adapts its front leg positions iteratively to anticipate and intercept the object. The catching maneuver involves identifying the optimal catching position, controlling the front legs with Cartesian PD control, and closing the legs together at the right moment. We propose and validate three different methods for selecting the optimal catching position: 1) intersecting the predicted trajectory with a vertical plane, 2) selecting the point on the predicted trajectory with the minimal distance to the center of the robot's legs in their nominal position, and 3) selecting the point on the predicted trajectory with the highest likelihood on a Gaussian Mixture Model (GMM) modelling the robot's reachable space. Experimental results demonstrate robust catching capabilities across various scenarios, with the GMM method achieving the best performance, leading to an 80% catching success rate. A video demonstration of the system in action can be found at https://youtu.be/sm7RdxRfIYg .
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Learning Human-Robot Handshaking Preferences for Quadruped Robots
Authors:
Alessandra Chappuis,
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
Quadruped robots are showing impressive abilities to navigate the real world. If they are to become more integrated into society, social trust in interactions with humans will become increasingly important. Additionally, robots will need to be adaptable to different humans based on individual preferences. In this work, we study the social interaction task of learning optimal handshakes for quadrup…
▽ More
Quadruped robots are showing impressive abilities to navigate the real world. If they are to become more integrated into society, social trust in interactions with humans will become increasingly important. Additionally, robots will need to be adaptable to different humans based on individual preferences. In this work, we study the social interaction task of learning optimal handshakes for quadruped robots based on user preferences. While maintaining balance on three legs, we parameterize handshakes with a Central Pattern Generator consisting of an amplitude, frequency, stiffness, and duration. Through 10 binary choices between handshakes, we learn a belief model to fit individual preferences for 25 different subjects. Our results show that this is an effective strategy, with 76% of users feeling happy with their identified optimal handshake parameters, and 20% feeling neutral. Moreover, compared with random and test handshakes, the optimized handshakes have significantly decreased errors in amplitude and frequency, lower Dynamic Time Warping scores, and improved energy efficiency, all of which indicate robot synchronization to the user's preferences. Video results can be found at https://youtu.be/elvPv8mq1KM .
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion
Authors:
Ge Sun,
Milad Shafiee,
Peizhuo Li,
Guillaume Bellegarda,
Auke Ijspeert,
Guillaume Sartoretti
Abstract:
Animals possess a remarkable ability to navigate challenging terrains, achieved through the interplay of various pathways between the brain, central pattern generators (CPGs) in the spinal cord, and musculoskeletal system. Traditional bioinspired control frameworks often rely on a singular control policy that models both higher (supraspinal) and spinal cord functions. In this work, we build upon o…
▽ More
Animals possess a remarkable ability to navigate challenging terrains, achieved through the interplay of various pathways between the brain, central pattern generators (CPGs) in the spinal cord, and musculoskeletal system. Traditional bioinspired control frameworks often rely on a singular control policy that models both higher (supraspinal) and spinal cord functions. In this work, we build upon our previous research by introducing two distinct neural networks: one tasked with modulating the frequency and amplitude of CPGs to generate the basic locomotor rhythm (referred to as the spinal policy, SCP), and the other responsible for receiving environmental perception data and directly modulating the rhythmic output from the SCP to execute precise movements on challenging terrains (referred to as the descending modulation policy). This division of labor more closely mimics the hierarchical locomotor control systems observed in legged animals, thereby enhancing the robot's ability to navigate various uneven surfaces, including steps, high obstacles, and terrains with gaps. Additionally, we investigate the impact of sensorimotor delays within our framework, validating several biological assumptions about animal locomotion systems. Specifically, we demonstrate that spinal circuits play a crucial role in generating the basic locomotor rhythm, while descending pathways are essential for enabling appropriate gait modifications to accommodate uneven terrain. Notably, our findings also reveal that the multi-layered control inherent in animals exhibits remarkable robustness against time delays. Through these investigations, this paper contributes to a deeper understanding of the fundamental principles of interplay between spinal and supraspinal mechanisms in biological locomotion. It also supports the development of locomotion controllers in parallel to biological structures which are ...
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Quadruped-Frog: Rapid Online Optimization of Continuous Quadruped Jumping
Authors:
Guillaume Bellegarda,
Milad Shafiee,
Merih Ekin Özberk,
Auke Ijspeert
Abstract:
Legged robots are becoming increasingly agile in exhibiting dynamic behaviors such as running and jumping. Usually, such behaviors are either optimized and engineered offline (i.e. the behavior is designed for before it is needed), either through model-based trajectory optimization, or through deep learning-based methods involving millions of timesteps of simulation interactions. Notably, such off…
▽ More
Legged robots are becoming increasingly agile in exhibiting dynamic behaviors such as running and jumping. Usually, such behaviors are either optimized and engineered offline (i.e. the behavior is designed for before it is needed), either through model-based trajectory optimization, or through deep learning-based methods involving millions of timesteps of simulation interactions. Notably, such offline-designed locomotion controllers cannot perfectly model the true dynamics of the system, such as the motor dynamics. In contrast, in this paper, we consider a quadruped jumping task that we rapidly optimize online. We design foot force profiles parameterized by only a few parameters which we optimize for directly on hardware with Bayesian Optimization. The force profiles are tracked at the joint level, and added to Cartesian PD impedance control and Virtual Model Control to stabilize the jumping motions. After optimization, which takes only a handful of jumps, we show that this control architecture is capable of diverse and omnidirectional jumps including forward, lateral, and twist (turning) jumps, even on uneven terrain, enabling the Unitree Go1 quadruped to jump 0.5 m high, 0.5 m forward, and jump-turn over 2 rad. Video results can be found at https://youtu.be/SvfVNQ90k_w.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots
Authors:
Milad Shafiee,
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes,…
▽ More
Learning a locomotion policy for quadruped robots has traditionally been constrained to a specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. The robot differences encompass: a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 18 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.
△ Less
Submitted 8 March, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Identifying Important Sensory Feedback for Learning Locomotion Skills
Authors:
Wanming Yu,
Chuanyu Yang,
Christopher McGreavy,
Eleftherios Triantafyllidis,
Guillaume Bellegarda,
Milad Shafiee,
Auke Jan Ijspeert,
Zhibin Li
Abstract:
Robot motor skills can be learned through deep reinforcement learning (DRL) by neural networks as state-action mappings. While the selection of state observations is crucial, there has been a lack of quantitative analysis to date. Here, we present a systematic saliency analysis that quantitatively evaluates the relative importance of different feedback states for motor skills learned through DRL.…
▽ More
Robot motor skills can be learned through deep reinforcement learning (DRL) by neural networks as state-action mappings. While the selection of state observations is crucial, there has been a lack of quantitative analysis to date. Here, we present a systematic saliency analysis that quantitatively evaluates the relative importance of different feedback states for motor skills learned through DRL. Our approach can identify the most essential feedback states for locomotion skills, including balance recovery, trotting, bounding, pacing and galloping. By using only key states including joint positions, gravity vector, base linear and angular velocities, we demonstrate that a simulated quadruped robot can achieve robust performance in various test scenarios across these distinct skills. The benchmarks using task performance metrics show that locomotion skills learned with key states can achieve comparable performance to those with all states, and the task performance or learning success rate will drop significantly if key states are missing. This work provides quantitative insights into the relationship between state observations and specific types of motor skills, serving as a guideline for robot motor learning. The proposed method is applicable to differentiable state-action mapping, such as neural network based control policies, enabling the learning of a wide range of motor skills with minimal sensing dependencies.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills
Authors:
Milad Shafiee,
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
Quadruped animals seamlessly transition between gaits as they change locomotion speeds. While the most widely accepted explanation for gait transitions is energy efficiency, there is no clear consensus on the determining factor, nor on the potential effects from terrain properties. In this article, we propose that viability, i.e. the avoidance of falls, represents an important criterion for gait t…
▽ More
Quadruped animals seamlessly transition between gaits as they change locomotion speeds. While the most widely accepted explanation for gait transitions is energy efficiency, there is no clear consensus on the determining factor, nor on the potential effects from terrain properties. In this article, we propose that viability, i.e. the avoidance of falls, represents an important criterion for gait transitions. We investigate the emergence of gait transitions through the interaction between supraspinal drive (brain), the central pattern generator in the spinal cord, the body, and exteroceptive sensing by leveraging deep reinforcement learning and robotics tools. Consistent with quadruped animal data, we show that the walk-trot gait transition for quadruped robots on flat terrain improves both viability and energy efficiency. Furthermore, we investigate the effects of discrete terrain (i.e. crossing successive gaps) on imposing gait transitions, and find the emergence of trot-pronk transitions to avoid non-viable states. Compared with other potential criteria such as peak forces and energy efficiency, viability is the only improved factor after gait transitions on both flat and discrete gap terrains, suggesting that viability could be a primary and universal objective of gait transitions, while other criteria are secondary objectives and/or a consequence of viability. Moreover, we deploy our learned controller in sim-to-real hardware experiments and demonstrate state-of-the-art quadruped agility in challenging scenarios, where the Unitree A1 quadruped autonomously transitions gaits between trot and pronk to cross consecutive gaps of up to 30 cm (83.3 % of the body-length) at over 1.3 m/s.
△ Less
Submitted 14 June, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Puppeteer and Marionette: Learning Anticipatory Quadrupedal Locomotion Based on Interactions of a Central Pattern Generator and Supraspinal Drive
Authors:
Milad Shafiee,
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
Quadruped animal locomotion emerges from the interactions between the spinal central pattern generator (CPG), sensory feedback, and supraspinal drive signals from the brain. Computational models of CPGs have been widely used for investigating the spinal cord contribution to animal locomotion control in computational neuroscience and in bio-inspired robotics. However, the contribution of supraspina…
▽ More
Quadruped animal locomotion emerges from the interactions between the spinal central pattern generator (CPG), sensory feedback, and supraspinal drive signals from the brain. Computational models of CPGs have been widely used for investigating the spinal cord contribution to animal locomotion control in computational neuroscience and in bio-inspired robotics. However, the contribution of supraspinal drive to anticipatory behavior, i.e. motor behavior that involves planning ahead of time (e.g. of footstep placements), is not yet properly understood. In particular, it is not clear whether the brain modulates CPG activity and/or directly modulates muscle activity (hence bypassing the CPG) for accurate foot placements. In this paper, we investigate the interaction of supraspinal drive and a CPG in an anticipatory locomotion scenario that involves stepping over gaps. By employing deep reinforcement learning (DRL), we train a neural network policy that replicates the supraspinal drive behavior. This policy can either modulate the CPG dynamics, or directly change actuation signals to bypass the CPG dynamics. Our results indicate that the direct supraspinal contribution to the actuation signal is a key component for a high gap crossing success rate. However, the CPG dynamics in the spinal cord are beneficial for gait smoothness and energy efficiency. Moreover, our investigation shows that sensing the front feet distances to the gap is the most important and sufficient sensory information for learning gap crossing. Our results support the biological hypothesis that cats and horses mainly control the front legs for obstacle avoidance, and that hind limbs follow an internal memory based on the front limbs' information. Our method enables the quadruped robot to cross gaps of up to 20 cm (50% of body-length) without any explicit dynamics modeling or Model Predictive Control (MPC).
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion
Authors:
Guillaume Bellegarda,
Milad Shafiee,
Auke Ijspeert
Abstract:
We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands,…
▽ More
We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands, while at the same time override these commands to avoid collisions with the environment. We investigate several open robotics and neuroscience questions: 1) What is the role of explicit interoscillator couplings between oscillators, and can such coupling improve sim-to-real transfer for navigation robustness? 2) What are the effects of using a memory-enabled vs. a memory-free policy network with respect to robustness, energy-efficiency, and tracking performance in sim-to-real navigation tasks? 3) How do animals manage to tolerate high sensorimotor delays, yet still produce smooth and robust gaits? To answer these questions, we train our perceptive locomotion policies in simulation and perform sim-to-real transfers to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks. Video results can be found at https://youtu.be/wpsbSMzIwgM.
△ Less
Submitted 11 March, 2024; v1 submitted 29 December, 2022;
originally announced December 2022.
-
CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion
Authors:
Guillaume Bellegarda,
Auke Ijspeert
Abstract:
In this letter, we present a method for integrating central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework to produce robust and omnidirectional quadruped locomotion. The agent learns to directly modulate the intrinsic oscillator setpoints (amplitude and frequency) and coordinate rhythmic behavior among different oscillators. Th…
▽ More
In this letter, we present a method for integrating central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework to produce robust and omnidirectional quadruped locomotion. The agent learns to directly modulate the intrinsic oscillator setpoints (amplitude and frequency) and coordinate rhythmic behavior among different oscillators. This approach also allows the use of DRL to explore questions related to neuroscience, namely the role of descending pathways, interoscillator couplings, and sensory feedback in gait generation. We train our policies in simulation and perform a sim-to-real transfer to the Unitree A1 quadruped, where we observe robust behavior to disturbances unseen during training, most notably to a dynamically added 13.75 kg load representing 115% of the nominal quadruped mass. We test several different observation spaces based on proprioceptive sensing and show that our framework is deployable with no domain randomization and very little feedback, where along with the oscillator states, it is possible to provide only contact booleans in the observation space. Video results can be found at https://youtu.be/xqXHLzLsEV4.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning
Authors:
Guillaume Bellegarda,
Yiyu Chen,
Zhuochen Liu,
Quan Nguyen
Abstract:
Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with…
▽ More
Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we present a learning framework that leads to the natural emergence of fast and robust bounding policies for quadruped robots. The agent both selects and controls actions directly in task space to track desired velocity commands subject to environmental noise including model uncertainty and rough terrain. We observe that this framework improves sample efficiency, necessitates little reward shaping, leads to the emergence of natural gaits such as galloping and bounding, and eases the sim-to-real transfer at running speeds. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo and sim-to-real transfer to the Unitree A1 hardware. For sim-to-sim, our results show the quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83% of the nominal quadruped mass. For sim-to-real, the Unitree A1 is able to bound at 2 m/s with a 5 kg load, representing 42% of the nominal quadruped mass.
△ Less
Submitted 15 March, 2023; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Robust Quadruped Jumping via Deep Reinforcement Learning
Authors:
Guillaume Bellegarda,
Chuong Nguyen,
Quan Nguyen
Abstract:
In this paper, we consider a general task of jumping varying distances and heights for a quadrupedal robot in noisy environments, such as off of uneven terrain and with variable robot dynamics parameters. To accurately jump in such conditions, we propose a framework using deep reinforcement learning that leverages and augments the complex solution of nonlinear trajectory optimization for quadruped…
▽ More
In this paper, we consider a general task of jumping varying distances and heights for a quadrupedal robot in noisy environments, such as off of uneven terrain and with variable robot dynamics parameters. To accurately jump in such conditions, we propose a framework using deep reinforcement learning that leverages and augments the complex solution of nonlinear trajectory optimization for quadrupedal jumping. While the standalone optimization limits jumping to take-off from flat ground and requires accurate assumptions of robot dynamics, our proposed approach improves the robustness to allow jumping off of significantly uneven terrain with variable robot dynamical parameters and environmental conditions. Compared with walking and running, the realization of aggressive jumping on hardware necessitates accounting for the motors' torque-speed relationship as well as the robot's total power limits. By incorporating these constraints into our learning framework, we successfully deploy our policy sim-to-real without further tuning, fully exploiting the available onboard power supply and motors. We demonstrate robustness to environment noise of foot disturbances of up to 6 cm in height, or 33% of the robot's nominal standing height, while jumping 2x the body length in distance.
△ Less
Submitted 11 August, 2023; v1 submitted 13 November, 2020;
originally announced November 2020.
-
Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning
Authors:
Guillaume Bellegarda,
Katie Byl
Abstract:
Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any modeling or intuition about the system, at the cost of high sample complexity and the inability to prove any metrics about the learned policies. Trajectory optimiz…
▽ More
Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any modeling or intuition about the system, at the cost of high sample complexity and the inability to prove any metrics about the learned policies. Trajectory optimization (TO) on the other hand allows for stability and robustness analyses on generated motions and trajectories, but is only as good as the often over-simplified derived model, and may have prohibitively expensive computation times for real-time control. This paper seeks to combine the benefits from these two areas while mitigating their drawbacks by (1) decreasing RL sample complexity by using existing knowledge of the problem with optimal control, and (2) providing an upper bound estimate on the time-to-arrival of the combined learned-optimized policy, allowing online policy deployment at any point in the training process by using the TO as a worst-case scenario action. This method is evaluated for a car model, with applicability to any mobile robotic system. A video showing policy execution comparisons can be found at https://youtu.be/mv2xw83NyWU .
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Training in Task Space to Speed Up and Guide Reinforcement Learning
Authors:
Guillaume Bellegarda,
Katie Byl
Abstract:
Recent breakthroughs in the reinforcement learning (RL) community have made significant advances towards learning and deploying policies on real world robotic systems. However, even with the current state-of-the-art algorithms and computational resources, these algorithms are still plagued with high sample complexity, and thus long training times, especially for high degree of freedom (DOF) system…
▽ More
Recent breakthroughs in the reinforcement learning (RL) community have made significant advances towards learning and deploying policies on real world robotic systems. However, even with the current state-of-the-art algorithms and computational resources, these algorithms are still plagued with high sample complexity, and thus long training times, especially for high degree of freedom (DOF) systems. There are also concerns arising from lack of perceived stability or robustness guarantees from emerging policies. This paper aims at mitigating these drawbacks by: (1) modeling a complex, high DOF system with a representative simple one, (2) making explicit use of forward and inverse kinematics without forcing the RL algorithm to "learn" them on its own, and (3) learning locomotion policies in Cartesian space instead of joint space. In this paper these methods are applied to JPL's Robosimian, but can be readily used on any system with a base and end effector(s). These locomotion policies can be produced in just a few minutes, trained on a single laptop. We compare the robustness of the resulting learned policies to those of other control methods. An accompanying video for this paper can be found at https://youtu.be/xDxxSw5ahnc .
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
Experimental Design for Human-in-the-Loop Driving Simulations
Authors:
Katherine Driggs-Campbell,
Guillaume Bellegarda,
Victor Shia,
S. Shankar Sastry,
Ruzena Bajcsy
Abstract:
This report describes a new experimental setup for human-in-the-loop simulations. A force feedback simulator with four axis motion has been setup for real-time driving experiments. The simulator will move to simulate the forces a driver feels while driving, which allows for a realistic experience for the driver. This setup allows for flexibility and control for the researcher in a realistic simula…
▽ More
This report describes a new experimental setup for human-in-the-loop simulations. A force feedback simulator with four axis motion has been setup for real-time driving experiments. The simulator will move to simulate the forces a driver feels while driving, which allows for a realistic experience for the driver. This setup allows for flexibility and control for the researcher in a realistic simulation environment. Experiments concerning driver distraction can also be carried out safely in this test bed, in addition to multi-agent experiments. All necessary code to run the simulator, the additional sensors, and the basic processing is available for use.
△ Less
Submitted 20 January, 2014;
originally announced January 2014.