-
Adaptive MPC-based quadrupedal robot control under periodic disturbances
Authors:
Elizaveta Pestova,
Ilya Osokin,
Danil Belov,
Pavel Osinenko
Abstract:
Recent advancements in adaptive control for reference trajectory tracking enable quadrupedal robots to perform locomotion tasks under challenging conditions. There are methods enabling the estimation of the external disturbances in terms of forces and torques. However, a specific case of disturbances that are periodic was not explicitly tackled in application to quadrupeds. This work is devoted to…
▽ More
Recent advancements in adaptive control for reference trajectory tracking enable quadrupedal robots to perform locomotion tasks under challenging conditions. There are methods enabling the estimation of the external disturbances in terms of forces and torques. However, a specific case of disturbances that are periodic was not explicitly tackled in application to quadrupeds. This work is devoted to the estimation of the periodic disturbances with a lightweight regressor using simplified robot dynamics and extracting the disturbance properties in terms of the magnitude and frequency. Experimental evidence suggests performance improvement over the baseline static disturbance compensation. All source files, including simulation setups, code, and calculation scripts, are available on GitHub at https://github.com/aidagroup/quad-periodic-mpc.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Quadrupedal Robot Skateboard Mounting via Reverse Curriculum Learning
Authors:
Danil Belov,
Artem Erkhov,
Elizaveta Pestova,
Ilya Osokin,
Dzmitry Tsetserukou,
Pavel Osinenko
Abstract:
The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively…
▽ More
The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively increasing the complexity of the problem definition to approximate the desired objective. The learning process was initiated with the skateboard rigidly fixed within the global coordinate frame and the robot positioned directly above it. Through gradual relaxation of these initial conditions, the learned policy demonstrated robustness to variations in skateboard position and orientation, ultimately exhibiting a successful transfer to scenarios involving a mobile skateboard. The code, trained models, and reproducible examples are available at the following link: https://github.com/dancher00/quadruped-skateboard-mounting
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
ViewVR: Visual Feedback Modes to Achieve Quality of VR-based Telemanipulation
Authors:
A. Erkhov,
A. Bazhenov,
S. Satsevich,
D. Belov,
F. Khabibullin,
S. Egorov,
M. Gromakov,
M. Altamirano Cabrera,
D. Tsetserukou
Abstract:
The paper focuses on an immersive teleoperation system that enhances operator's ability to actively perceive the robot's surroundings. A consumer-grade HTC Vive VR system was used to synchronize the operator's hand and head movements with a UR3 robot and a custom-built robotic head with two degrees of freedom (2-DoF). The system's usability, manipulation efficiency, and intuitiveness of control we…
▽ More
The paper focuses on an immersive teleoperation system that enhances operator's ability to actively perceive the robot's surroundings. A consumer-grade HTC Vive VR system was used to synchronize the operator's hand and head movements with a UR3 robot and a custom-built robotic head with two degrees of freedom (2-DoF). The system's usability, manipulation efficiency, and intuitiveness of control were evaluated in comparison with static head camera positioning across three distinct tasks. Code and other supplementary materials can be accessed by link: https://github.com/ErkhovArtem/ViewVR
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Optimizing energy consumption for legged robot by adapting equilibrium position and stiffness of a parallel torsion spring
Authors:
Danil Belov,
Artem Erkhov,
Farit Khabibullin,
Elisaveta Pestova,
Sergei Satsevich,
Ilya Osokin,
Pavel Osinenko,
Dzmitry Tsetserukou
Abstract:
This paper is dedicated to the development of a novel adaptive torsion spring mechanism for optimizing energy consumption in legged robots. By adjusting the equilibrium position and stiffness of the spring, the system improves energy efficiency during cyclic movements, such as walking and jumping. The adaptive compliance mechanism, consisting of a torsion spring combined with a worm gear driven by…
▽ More
This paper is dedicated to the development of a novel adaptive torsion spring mechanism for optimizing energy consumption in legged robots. By adjusting the equilibrium position and stiffness of the spring, the system improves energy efficiency during cyclic movements, such as walking and jumping. The adaptive compliance mechanism, consisting of a torsion spring combined with a worm gear driven by a servo actuator, compensates for motion-induced torque and reduces motor load. Simulation results demonstrate a significant reduction in power consumption, highlighting the effectiveness of this approach in enhancing robotic locomotion.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
HyperSurf: Quadruped Robot Leg Capable of Surface Recognition with GRU and Real-to-Sim Transferring
Authors:
Sergei Satsevich,
Yaroslav Savotin,
Danil Belov,
Elizaveta Pestova,
Artem Erhov,
Batyr Khabibullin,
Artem Bazhenov,
Vyacheslav Kovalev,
Aleksey Fedoseev,
Dzmitry Tsetserukou
Abstract:
This paper introduces a system of data collection acceleration and real-to-sim transferring for surface recognition on a quadruped robot. The system features a mechanical single-leg setup capable of stepping on various easily interchangeable surfaces. Additionally, it incorporates a GRU-based Surface Recognition System, inspired by the system detailed in the Dog-Surf paper. This setup facilitates…
▽ More
This paper introduces a system of data collection acceleration and real-to-sim transferring for surface recognition on a quadruped robot. The system features a mechanical single-leg setup capable of stepping on various easily interchangeable surfaces. Additionally, it incorporates a GRU-based Surface Recognition System, inspired by the system detailed in the Dog-Surf paper. This setup facilitates the expansion of dataset collection for model training, enabling data acquisition from hard-to-reach surfaces in laboratory conditions. Furthermore, it opens avenues for transferring surface properties from reality to simulation, thereby allowing the training of optimal gaits for legged robots in simulation environments using a pre-prepared library of digital twins of surfaces. Moreover, enhancements have been made to the GRU-based Surface Recognition System, allowing for the integration of data from both the quadruped robot and the single-leg setup. The dataset and code have been made publicly available.
△ Less
Submitted 19 August, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
PartIR: Composing SPMD Partitioning Strategies for Machine Learning
Authors:
Sami Alabed,
Daniel Belov,
Bart Chrzaszcz,
Juliana Franco,
Dominik Grewe,
Dougal Maclaurin,
James Molloy,
Tom Natan,
Tamara Norman,
Xiaoyue Pan,
Adam Paszke,
Norman A. Rink,
Michael Schaarschmidt,
Timur Sitdikov,
Agnieszka Swietlik,
Dimitrios Vytiniotis,
Joel Wee
Abstract:
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN par…
▽ More
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..
△ Less
Submitted 24 November, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Automap: Towards Ergonomic Automated Parallelism for ML Models
Authors:
Michael Schaarschmidt,
Dominik Grewe,
Dimitrios Vytiniotis,
Adam Paszke,
Georg Stefan Schmid,
Tamara Norman,
James Molloy,
Jonathan Godwin,
Norman Alexander Rink,
Vinod Nair,
Dan Belov
Abstract:
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype o…
▽ More
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Local Search for Policy Iteration in Continuous Control
Authors:
Jost Tobias Springenberg,
Nicolas Heess,
Daniel Mankowitz,
Josh Merel,
Arunkumar Byravan,
Abbas Abdolmaleki,
Jackie Kay,
Jonas Degrave,
Julian Schrittwieser,
Yuval Tassa,
Jonas Buchli,
Dan Belov,
Martin Riedmiller
Abstract:
We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based…
▽ More
We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Authors:
H. Francis Song,
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Aidan Clark,
Hubert Soyer,
Jack W. Rae,
Seb Noury,
Arun Ahuja,
Siqi Liu,
Dhruva Tirumala,
Nicolas Heess,
Dan Belov,
Martin Riedmiller,
Matthew M. Botvinick
Abstract:
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradie…
▽ More
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting. However, policy gradients can suffer from large variance that may limit performance, and in practice require carefully tuned entropy regularization to prevent policy collapse. As an alternative to policy gradient algorithms, we introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. We show that V-MPO surpasses previously reported scores for both the Atari-57 and DMLab-30 benchmark suites in the multi-task setting, and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters. On individual DMLab and Atari levels, the proposed algorithm can achieve scores that are substantially higher than has previously been reported. V-MPO is also applicable to problems with high-dimensional, continuous action spaces, which we demonstrate in the context of learning to control simulated humanoids with 22 degrees of freedom from full state observations and 56 degrees of freedom from pixel observations, as well as example OpenAI Gym tasks where V-MPO achieves substantially higher asymptotic scores than previously reported.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
TF-Replicator: Distributed Machine Learning for Researchers
Authors:
Peter Buchlovsky,
David Budden,
Dominik Grewe,
Chris Jones,
John Aslanides,
Frederic Besse,
Andy Brock,
Aidan Clark,
Sergio Gómez Colmenarejo,
Aedan Pope,
Fabio Viola,
Dan Belov
Abstract:
We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchr…
▽ More
We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Relative Entropy Regularized Policy Iteration
Authors:
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Jonas Degrave,
Steven Bohez,
Yuval Tassa,
Dan Belov,
Nicolas Heess,
Martin Riedmiller
Abstract:
We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and…
▽ More
We present an off-policy actor-critic algorithm for Reinforcement Learning (RL) that combines ideas from gradient-free optimization via stochastic search with learned action-value function. The result is a simple procedure consisting of three steps: i) policy evaluation by estimating a parametric action-value function; ii) policy improvement via the estimation of a local non-parametric policy; and iii) generalization by fitting a parametric policy. Each step can be implemented in different ways, giving rise to several algorithm variants. Our algorithm draws on connections to existing literature on black-box optimization and 'RL as an inference' and it can be seen either as an extension of the Maximum a Posteriori Policy Optimisation algorithm (MPO) [Abdolmaleki et al., 2018a], or as an extension of Trust Region Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) [Abdolmaleki et al., 2017b; Hansen et al., 1997] to a policy iteration scheme. Our comparison on 31 continuous control tasks from parkour suite [Heess et al., 2017], DeepMind control suite [Tassa et al., 2018] and OpenAI Gym [Brockman et al., 2016] with diverse properties, limited amount of compute and a single set of hyperparameters, demonstrate the effectiveness of our method and the state of art results. Videos, summarizing results, can be found at goo.gl/HtvJKR .
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Authors:
Aaron van den Oord,
Yazhe Li,
Igor Babuschkin,
Karen Simonyan,
Oriol Vinyals,
Koray Kavukcuoglu,
George van den Driessche,
Edward Lockhart,
Luis C. Cobo,
Florian Stimberg,
Norman Casagrande,
Dominik Grewe,
Seb Noury,
Sander Dieleman,
Erich Elsen,
Nal Kalchbrenner,
Heiga Zen,
Alex Graves,
Helen King,
Tom Walters,
Dan Belov,
Demis Hassabis
Abstract:
The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time p…
▽ More
The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
Parallel Multiscale Autoregressive Density Estimation
Authors:
Scott Reed,
Aäron van den Oord,
Nal Kalchbrenner,
Sergio Gómez Colmenarejo,
Ziyu Wang,
Dan Belov,
Nando de Freitas
Abstract:
PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain…
▽ More
PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512x512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.