-
Learning global control of underactuated systems with Model-Based Reinforcement Learning
Authors:
Niccolò Turcato,
Marco Calì,
Alberto Dalla Libera,
Giulio Giacomuzzo,
Ruggero Carli,
Diego Romeres
Abstract:
This short paper describes our proposed solution for the third edition of the "AI Olympics with RealAIGym" competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball \& plate, and Furuta pendulum systems. MC-P…
▽ More
This short paper describes our proposed solution for the third edition of the "AI Olympics with RealAIGym" competition, held at ICRA 2025. We employed Monte-Carlo Probabilistic Inference for Learning Control (MC-PILCO), an MBRL algorithm recognized for its exceptional data efficiency across various low-dimensional robotic tasks, including cart-pole, ball \& plate, and Furuta pendulum systems. MC-PILCO optimizes a system dynamics model using interaction data, enabling policy refinement through simulation rather than direct system data optimization. This approach has proven highly effective in physical systems, offering greater data efficiency than Model-Free (MF) alternatives. Notably, MC-PILCO has previously won the first two editions of this competition, demonstrating its robustness in both simulated and real-world environments. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand: learning a global policy for the pendubot and acrobot systems.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd 'AI Olympics with RealAIGym' Competition
Authors:
Felix Wiebe,
Niccolò Turcato,
Alberto Dalla Libera,
Jean Seong Bjorn Choe,
Bumkyu Choi,
Tim Lukas Faust,
Habib Maraqten,
Erfan Aghadavoodi,
Marco Cali,
Alberto Sinigaglia,
Giulio Giacomuzzo,
Diego Romeres,
Jong-kook Kim,
Gian Antonio Susto,
Shubham Vyas,
Dennis Mronga,
Boris Belousov,
Jan Peters,
Frank Kirchner,
Shivesh Kumar
Abstract:
In the field of robotics many different approaches ranging from classical planning over optimal control to reinforcement learning (RL) are developed and borrowed from other fields to achieve reliable control in diverse tasks. In order to get a clear understanding of their individual strengths and weaknesses and their applicability in real world robotic scenarios is it important to benchmark and co…
▽ More
In the field of robotics many different approaches ranging from classical planning over optimal control to reinforcement learning (RL) are developed and borrowed from other fields to achieve reliable control in diverse tasks. In order to get a clear understanding of their individual strengths and weaknesses and their applicability in real world robotic scenarios is it important to benchmark and compare their performances not only in a simulation but also on real hardware. The '2nd AI Olympics with RealAIGym' competition was held at the IROS 2024 conference to contribute to this cause and evaluate different controllers according to their ability to solve a dynamic control problem on an underactuated double pendulum system with chaotic dynamics. This paper describes the four different RL methods submitted by the participating teams, presents their performance in the swing-up task on a real double pendulum, measured against various criteria, and discusses their transferability from simulation to real hardware and their robustness to external disturbances.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Authors:
Niccolò Turcato,
Matteo Iovino,
Aris Synodinos,
Alberto Dalla Libera,
Ruggero Carli,
Pietro Falco
Abstract:
Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains cha…
▽ More
Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex Human-Informed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.
△ Less
Submitted 10 June, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
Data efficient Robotic Object Throwing with Model-Based Reinforcement Learning
Authors:
Niccolò Turcato,
Giulio Giacomuzzo,
Matteo Terreran,
Davide Allegro,
Ruggero Carli,
Alberto Dalla Libera
Abstract:
Pick-and-place (PnP) operations, featuring object grasping and trajectory planning, are fundamental in industrial robotics applications. Despite many advancements in the field, PnP is limited by workspace constraints, reducing flexibility. Pick-and-throw (PnT) is a promising alternative where the robot throws objects to target locations, leveraging extrinsic resources like gravity to improve effic…
▽ More
Pick-and-place (PnP) operations, featuring object grasping and trajectory planning, are fundamental in industrial robotics applications. Despite many advancements in the field, PnP is limited by workspace constraints, reducing flexibility. Pick-and-throw (PnT) is a promising alternative where the robot throws objects to target locations, leveraging extrinsic resources like gravity to improve efficiency and expand the workspace. However, PnT execution is complex, requiring precise coordination of high-speed movements and object dynamics. Solutions to the PnT problem are categorized into analytical and learning-based approaches. Analytical methods focus on system modeling and trajectory generation but are time-consuming and offer limited generalization. Learning-based solutions, in particular Model-Free Reinforcement Learning (MFRL), offer automation and adaptability but require extensive interaction time. This paper introduces a Model-Based Reinforcement Learning (MBRL) framework, MC-PILOT, which combines data-driven modeling with policy optimization for efficient and accurate PnT tasks. MC-PILOT accounts for model uncertainties and release errors, demonstrating superior performance in simulations and real-world tests with a Franka Emika Panda manipulator. The proposed approach generalizes rapidly to new targets, offering advantages over analytical and Model-Free methods.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios
Authors:
Alberto Sinigaglia,
Niccolò Turcato,
Ruggero Carli,
Gian Antonio Susto
Abstract:
Deep Reinforcement Learning is gaining increasing attention thanks to its capability to learn complex policies in high-dimensional settings. Recent advancements utilize a dual-network architecture to learn optimal policies through the Q-learning algorithm. However, this approach has notable drawbacks, such as an overestimation bias that can disrupt the learning process and degrade the performance…
▽ More
Deep Reinforcement Learning is gaining increasing attention thanks to its capability to learn complex policies in high-dimensional settings. Recent advancements utilize a dual-network architecture to learn optimal policies through the Q-learning algorithm. However, this approach has notable drawbacks, such as an overestimation bias that can disrupt the learning process and degrade the performance of the resulting policy. To address this, novel algorithms have been developed that mitigate overestimation bias by employing multiple Q-functions. Edge scenarios, which prioritize privacy, have recently gained prominence. In these settings, limited computational resources pose a significant challenge for complex Machine Learning approaches, making the efficiency of algorithms crucial for their performance. In this work, we introduce a novel Reinforcement Learning algorithm tailored for edge scenarios, called Edge Delayed Deep Deterministic Policy Gradient (EdgeD3). EdgeD3 enhances the Deep Deterministic Policy Gradient (DDPG) algorithm, achieving significantly improved performance with $25\%$ less Graphics Process Unit (GPU) time while maintaining the same memory usage. Additionally, EdgeD3 consistently matches or surpasses the performance of state-of-the-art methods across various benchmarks, all while using $30\%$ fewer computational resources and requiring $30\%$ less memory.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Learning control of underactuated double pendulum with Model-Based Reinforcement Learning
Authors:
Niccolò Turcato,
Alberto Dalla Libera,
Giulio Giacomuzzo,
Ruggero Carli,
Diego Romeres
Abstract:
This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.
This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
AI Olympics challenge with Evolutionary Soft Actor Critic
Authors:
Marco Calì,
Alberto Sinigaglia,
Niccolò Turcato,
Ruggero Carli,
Gian Antonio Susto
Abstract:
In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach
In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach
△ Less
Submitted 28 October, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks
Authors:
Niccolò Turcato,
Alberto Sinigaglia,
Alberto Dalla Libera,
Ruggero Carli,
Gian Antonio Susto
Abstract:
Continuous control Deep Reinforcement Learning (RL) approaches are known to suffer from estimation biases, leading to suboptimal policies. This paper introduces innovative methods in RL, focusing on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks, using Deep Double Q-Learning. We design a Bias Exploiting (BE) mechanism to dynamically select the most…
▽ More
Continuous control Deep Reinforcement Learning (RL) approaches are known to suffer from estimation biases, leading to suboptimal policies. This paper introduces innovative methods in RL, focusing on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks, using Deep Double Q-Learning. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity. Our extensive experiments across various continuous control tasks demonstrate the effectiveness of our approaches. We show that RL algorithms equipped with this method can match or surpass their counterparts, particularly in environments where estimation biases significantly impact learning. The results underline the importance of bias exploitation in improving policy learning in RL.
△ Less
Submitted 11 October, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.