-
MILES: Making Imitation Learning Easy with Self-Supervision
Authors:
Georgios Papagiannis,
Edward Johns
Abstract:
Data collection in imitation learning often requires significant, laborious human supervision, such as numerous demonstrations, and/or frequent environment resets for methods that incorporate reinforcement learning. In this work, we propose an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and we show that this enables efficient policy learning from just…
▽ More
Data collection in imitation learning often requires significant, laborious human supervision, such as numerous demonstrations, and/or frequent environment resets for methods that incorporate reinforcement learning. In this work, we propose an alternative approach, MILES: a fully autonomous, self-supervised data collection paradigm, and we show that this enables efficient policy learning from just a single demonstration and a single environment reset. MILES autonomously learns a policy for returning to and then following the single demonstration, whilst being self-guided during data collection, eliminating the need for additional human interventions. We evaluated MILES across several real-world tasks, including tasks that require precise contact-rich manipulation such as locking a lock with a key. We found that, under the constraints of a single demonstration and no repeated environment resetting, MILES significantly outperforms state-of-the-art alternatives like imitation learning methods that leverage reinforcement learning. Videos of our experiments and code can be found on our webpage: www.robot-learning.uk/miles.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Adapting Skills to Novel Grasps: A Self-Supervised Approach
Authors:
Georgios Papagiannis,
Kamil Dreczkowski,
Vitalis Vosylius,
Edward Johns
Abstract:
In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of sel…
▽ More
In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
R+X: Retrieval and Execution from Everyday Human Videos
Authors:
Georgios Papagiannis,
Norman Di Palo,
Pietro Vitiello,
Edward Johns
Abstract:
We present R+X, a framework which enables robots to learn skills from long, unlabelled, first-person videos of humans performing everyday tasks. Given a language command from a human, R+X first retrieves short video clips containing relevant behaviour, and then executes the skill by conditioning an in-context imitation learning method (KAT) on this behaviour. By leveraging a Vision Language Model…
▽ More
We present R+X, a framework which enables robots to learn skills from long, unlabelled, first-person videos of humans performing everyday tasks. Given a language command from a human, R+X first retrieves short video clips containing relevant behaviour, and then executes the skill by conditioning an in-context imitation learning method (KAT) on this behaviour. By leveraging a Vision Language Model (VLM) for retrieval, R+X does not require any manual annotation of the videos, and by leveraging in-context learning for execution, robots can perform commanded skills immediately, without requiring a period of training on the retrieved videos. Experiments studying a range of everyday household tasks show that R+X succeeds at translating unlabelled human videos into robust robot skills, and that R+X outperforms several recent alternative methods. Videos and code are available at https://www.robot-learning.uk/r-plus-x.
△ Less
Submitted 3 April, 2025; v1 submitted 17 July, 2024;
originally announced July 2024.
-
Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning
Authors:
Eugene Valassakis,
Georgios Papagiannis,
Norman Di Palo,
Edward Johns
Abstract:
We present DOME, a novel method for one-shot imitation learning, where a task can be learned from just a single demonstration and then be deployed immediately, without any further data collection or training. DOME does not require prior task or object knowledge, and can perform the task in novel object configurations and with distractors. At its core, DOME uses an image-conditioned object segmenta…
▽ More
We present DOME, a novel method for one-shot imitation learning, where a task can be learned from just a single demonstration and then be deployed immediately, without any further data collection or training. DOME does not require prior task or object knowledge, and can perform the task in novel object configurations and with distractors. At its core, DOME uses an image-conditioned object segmentation network followed by a learned visual servoing network, to move the robot's end-effector to the same relative pose to the object as during the demonstration, after which the task can be completed by replaying the demonstration's end-effector velocities. We show that DOME achieves near 100% success rate on 7 real-world everyday tasks, and we perform several studies to thoroughly understand each individual component of DOME. Videos and supplementary material are available at: https://www.robot-learning.uk/dome .
△ Less
Submitted 27 July, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
End-To-End Semi-supervised Learning for Differentiable Particle Filters
Authors:
Hao Wen,
Xiongjie Chen,
Georgios Papagiannis,
Conghui Hu,
Yunpeng Li
Abstract:
Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expens…
▽ More
Recent advances in incorporating neural networks into particle filters provide the desired flexibility to apply particle filters in large-scale real-world applications. The dynamic and measurement models in this framework are learnable through the differentiable implementation of particle filters. Past efforts in optimising such models often require the knowledge of true states which can be expensive to obtain or even unavailable in practice. In this paper, in order to reduce the demand for annotated data, we present an end-to-end learning objective based upon the maximisation of a pseudo-likelihood function which can improve the estimation of states when large portion of true states are unknown. We assess performance of the proposed method in state estimation tasks in robotics with simulated and real-world datasets.
△ Less
Submitted 28 March, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Imitation Learning with Sinkhorn Distances
Authors:
Georgios Papagiannis,
Yunpeng Li
Abstract:
Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulat…
▽ More
Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation.
△ Less
Submitted 2 July, 2022; v1 submitted 20 August, 2020;
originally announced August 2020.
-
Assessing the viability of Battery Energy Storage Systems coupled with Photovoltaics under a pure self-consumption scheme
Authors:
Georgios A. Barzegkar-Ntovom,
Nikolas G. Chatzigeorgiou,
Angelos I. Nousdilis,
Styliani A. Vomva,
Georgios C. Kryonidis,
Eleftherios O. Kontis,
George E. Georghiou,
Georgios C. Christoforidis,
Grigoris K. Papagiannis
Abstract:
Over the last few decades, there is a constantly increasing deployment of solar photovoltaic (PV) systems both at the commercial and residential building sector. However, the steadily growing PV penetration poses several technical problems to electric power systems, mainly related to power quality issues. To this context, the exploitation of energy storage systems integrated along with PVs could c…
▽ More
Over the last few decades, there is a constantly increasing deployment of solar photovoltaic (PV) systems both at the commercial and residential building sector. However, the steadily growing PV penetration poses several technical problems to electric power systems, mainly related to power quality issues. To this context, the exploitation of energy storage systems integrated along with PVs could constitute a possible solution. The scope of this paper is to thoroughly evaluate the economic viability of hybrid PV-and-Storage systems at the residential building level under a future pure self-consumption policy that provides no reimbursement for excess PV energy injected to the grid. For this purpose, an indicator referred to as the Levelized Cost of Use is utilized for the assessment of the competitiveness of hybrid PV-and-Storage systems in the energy market, considering various sizes of the hybrid system, battery energy storage costs and prosumer types for six Mediterranean countries.
△ Less
Submitted 8 February, 2020; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Deep Reinforcement Learning for Control of Probabilistic Boolean Networks
Authors:
Georgios Papagiannis,
Sotiris Moschoyiannis
Abstract:
Probabilistic Boolean Networks (PBNs) were introduced as a computational model for the study of complex dynamical systems, such as Gene Regulatory Networks (GRNs). Controllability in this context is the process of making strategic interventions to the state of a network in order to drive it towards some other state that exhibits favourable biological properties. In this paper we study the ability…
▽ More
Probabilistic Boolean Networks (PBNs) were introduced as a computational model for the study of complex dynamical systems, such as Gene Regulatory Networks (GRNs). Controllability in this context is the process of making strategic interventions to the state of a network in order to drive it towards some other state that exhibits favourable biological properties. In this paper we study the ability of a Double Deep Q-Network with Prioritized Experience Replay in learning control strategies within a finite number of time steps that drive a PBN towards a target state, typically an attractor. The control method is model-free and does not require knowledge of the network's underlying dynamics, making it suitable for applications where inference of such dynamics is intractable. We present extensive experiment results on two synthetic PBNs and the PBN model constructed directly from gene-expression data of a study on metastatic-melanoma.
△ Less
Submitted 7 September, 2020; v1 submitted 7 September, 2019;
originally announced September 2019.
-
A Bayesian Ensemble Regression Framework on the Angry Birds Game
Authors:
Nikolaos Tziortziotis,
Georgios Papagiannis,
Konstantinos Blekas
Abstract:
An ensemble inference mechanism is proposed on the Angry Birds domain. It is based on an efficient tree structure for encoding and representing game screenshots, where it exploits its enhanced modeling capability. This has the advantage to establish an informative feature space and modify the task of game playing to a regression analysis problem. To this direction, we assume that each type of obje…
▽ More
An ensemble inference mechanism is proposed on the Angry Birds domain. It is based on an efficient tree structure for encoding and representing game screenshots, where it exploits its enhanced modeling capability. This has the advantage to establish an informative feature space and modify the task of game playing to a regression analysis problem. To this direction, we assume that each type of object material and bird pair has its own Bayesian linear regression model. In this way, a multi-model regression framework is designed that simultaneously calculates the conditional expectations of several objects and makes a target decision through an ensemble of regression models. Learning procedure is performed according to an online estimation strategy for the model parameters. We provide comparative experimental results on several game levels that empirically illustrate the efficiency of the proposed methodology.
△ Less
Submitted 25 August, 2014; v1 submitted 22 August, 2014;
originally announced August 2014.