-
Learning Dexterous Object Handover
Authors:
Daniel Frau-Alfaro,
Julio Castaño-Amoros,
Santiago Puente,
Pablo Gil,
Roberto Calandra
Abstract:
Object handover is an important skill that we use daily when interacting with other humans. To deploy robots in collaborative setting, like houses, being able to receive and handing over objects safely and efficiently becomes a crucial skill. In this work, we demonstrate the use of Reinforcement Learning (RL) for dexterous object handover between two multi-finger hands. Key to this task is the use…
▽ More
Object handover is an important skill that we use daily when interacting with other humans. To deploy robots in collaborative setting, like houses, being able to receive and handing over objects safely and efficiently becomes a crucial skill. In this work, we demonstrate the use of Reinforcement Learning (RL) for dexterous object handover between two multi-finger hands. Key to this task is the use of a novel reward function based on dual quaternions to minimize the rotation distance, which outperforms other rotation representations such as Euler and rotation matrices. The robustness of the trained policy is experimentally evaluated by testing w.r.t. objects that are not included in the training distribution, and perturbations during the handover process. The results demonstrate that the trained policy successfully perform this task, achieving a total success rate of 94% in the best-case scenario after 100 experiments, thereby showing the robustness of our policy with novel objects. In addition, the best-case performance of the policy decreases by only 13.8% when the other robot moves during the handover, proving that our policy is also robust to this type of perturbation, which is common in real-world object handovers.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
LiCAR: pseudo-RGB LiDAR image for CAR segmentation
Authors:
Ignacio de Loyola Páez-Ubieta,
Edison P. Velasco-Sánchez,
Santiago T. Puente
Abstract:
With the advancement of computing resources, an increasing number of Neural Networks (NNs) are appearing for image detection and segmentation appear. However, these methods usually accept as input a RGB 2D image. On the other side, Light Detection And Ranging (LiDAR) sensors with many layers provide images that are similar to those obtained from a traditional low resolution RGB camera. Following t…
▽ More
With the advancement of computing resources, an increasing number of Neural Networks (NNs) are appearing for image detection and segmentation appear. However, these methods usually accept as input a RGB 2D image. On the other side, Light Detection And Ranging (LiDAR) sensors with many layers provide images that are similar to those obtained from a traditional low resolution RGB camera. Following this principle, a new dataset for segmenting cars in pseudo-RGB images has been generated. This dataset combines the information given by the LiDAR sensor into a Spherical Range Image (SRI), concretely the reflectivity, near infrared and signal intensity 2D images. These images are then fed into instance segmentation NNs. These NNs segment the cars that appear in these images, having as result a Bounding Box (BB) and mask precision of 88% and 81.5% respectively with You Only Look Once (YOLO)-v8 large. By using this segmentation NN, some trackers have been applied so as to follow each car segmented instance along a video feed, having great performance in real world experiments.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Transferability of labels between multilens cameras
Authors:
Ignacio de Loyola Páez-Ubieta,
Daniel Frau-Alfaro,
Santiago T. Puente
Abstract:
In this work, a new method for automatically extending Bounding Box (BB) and mask labels across different channels on multilens cameras is presented. For that purpose, the proposed method combines the well known phase correlation method with a refinement process. During the first step, images are aligned by localizing the peak of intensity obtained in the spatial domain after performing the cross…
▽ More
In this work, a new method for automatically extending Bounding Box (BB) and mask labels across different channels on multilens cameras is presented. For that purpose, the proposed method combines the well known phase correlation method with a refinement process. During the first step, images are aligned by localizing the peak of intensity obtained in the spatial domain after performing the cross correlation process in the frequency domain. The second step consists of obtaining the best possible transformation by using an iterative process maximising the IoU (Intersection over Union) metric. Results show that, by using this method, labels could be transferred across different lens on a camera with an accuracy over 90% in most cases and just by using 65 ms in the whole process. Once the transformations are obtained, artificial RGB images are generated, for labeling them so as to transfer this information into each of the other lens. This work will allow users to use this type of cameras in more fields rather than satellite or medical imagery, giving the chance of labeling even invisible objects in the visible spectrum.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
QDGset: A Large Scale Grasping Dataset Generated with Quality-Diversity
Authors:
Johann Huber,
François Hélénon,
Mathilde Kappel,
Ignacio de Loyola Páez-Ubieta,
Santiago T. Puente,
Pablo Gil,
Faïz Ben Amar,
Stéphane Doncieux
Abstract:
Recent advances in AI have led to significant results in robotic learning, but skills like grasping remain partially solved. Many recent works exploit synthetic grasping datasets to learn to grasp unknown objects. However, those datasets were generated using simple grasp sampling methods using priors. Recently, Quality-Diversity (QD) algorithms have been proven to make grasp sampling significantly…
▽ More
Recent advances in AI have led to significant results in robotic learning, but skills like grasping remain partially solved. Many recent works exploit synthetic grasping datasets to learn to grasp unknown objects. However, those datasets were generated using simple grasp sampling methods using priors. Recently, Quality-Diversity (QD) algorithms have been proven to make grasp sampling significantly more efficient. In this work, we extend QDG-6DoF, a QD framework for generating object-centric grasps, to scale up the production of synthetic grasping datasets. We propose a data augmentation method that combines the transformation of object meshes with transfer learning from previous grasping repertoires. The conducted experiments show that this approach reduces the number of required evaluations per discovered robust grasp by up to 20%. We used this approach to generate QDGset, a dataset of 6DoF grasp poses that contains about 3.5 and 4.5 times more grasps and objects, respectively, than the previous state-of-the-art. Our method allows anyone to easily generate data, eventually contributing to a large-scale collaborative dataset of synthetic grasps.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Visual-tactile manipulation to collect household waste in outdoor
Authors:
Julio Castaño-Amorós,
Ignacio de Loyola Páez-Ubieta,
Pablo Gil,
Santiago Timoteo Puente
Abstract:
This work presents a perception system applied to robotic manipulation, that is able to assist in navigation, household waste classification and collection in outdoor environments. This system is made up of optical tactile sensors, RGBD cameras and a LiDAR. These sensors are integrated on a mobile platform with a robot manipulator and a robotic gripper. Our system is divided in three software modu…
▽ More
This work presents a perception system applied to robotic manipulation, that is able to assist in navigation, household waste classification and collection in outdoor environments. This system is made up of optical tactile sensors, RGBD cameras and a LiDAR. These sensors are integrated on a mobile platform with a robot manipulator and a robotic gripper. Our system is divided in three software modules, two of them are vision-based and the last one is tactile-based. The vision-based modules use CNNs to localize and recognize solid household waste, together with the grasping points estimation. The tactile-based module, which also uses CNNs and image processing, adjusts the gripper opening to control the grasping from touch data. Our proposal achieves localization errors around 6 %, a recognition accuracy of 98% and ensures the grasping stability the 91% of the attempts. The sum of runtimes of the three modules is less than 750 ms.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Vision and Tactile Robotic System to Grasp Litter in Outdoor Environments
Authors:
Ignacio de Loyola Páez-Ubieta,
Julio Castaño-Amorós,
Santiago T. Puente,
Pablo Gil
Abstract:
The accumulation of litter is increasing in many places and is consequently becoming a problem that must be dealt with. In this paper, we present a manipulator robotic system to collect litter in outdoor environments. This system has three functionalities. Firstly, it uses colour images to detect and recognise litter comprising different materials. Secondly, depth data are combined with pixels of…
▽ More
The accumulation of litter is increasing in many places and is consequently becoming a problem that must be dealt with. In this paper, we present a manipulator robotic system to collect litter in outdoor environments. This system has three functionalities. Firstly, it uses colour images to detect and recognise litter comprising different materials. Secondly, depth data are combined with pixels of waste objects to compute a 3D location and segment three-dimensional point clouds of the litter items in the scene. The grasp in 3 Degrees of Freedom (DoFs) is then estimated for a robot arm with a gripper for the segmented cloud of each instance of waste. Finally, two tactile-based algorithms are implemented and then employed in order to provide the gripper with a sense of touch. This work uses two low-cost visual-based tactile sensors at the fingertips. One of them addresses the detection of contact (which is obtained from tactile images) between the gripper and solid waste, while another has been designed to detect slippage in order to prevent the objects grasped from falling. Our proposal was successfully tested by carrying out extensive experimentation with different objects varying in size, texture, geometry and materials in different outdoor environments (a tiled pavement, a surface of stone/soil, and grass). Our system achieved an average score of 94% for the detection and Collection Success Rate (CSR) as regards its overall performance, and of 80% for the collection of items of litter at the first attempt.
△ Less
Submitted 16 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Visual Servoing NMPC Applied to UAVs for Photovoltaic Array Inspection
Authors:
Edison P. Velasco-Sánchez,
Luis F. Recalde,
Bryan S. Guevara,
José Varela-Aldás,
Francisco A. Candelas,
Santiago T. Puente,
Daniel C. Gandolfo
Abstract:
The photovoltaic (PV) industry is seeing a significant shift toward large-scale solar plants, where traditional inspection methods have proven to be time-consuming and costly. Currently, the predominant approach to PV inspection using unmanned aerial vehicles (UAVs) is based on photogrammetry. However, the photogrammetry approach presents limitations, such as an increased amount of useless data du…
▽ More
The photovoltaic (PV) industry is seeing a significant shift toward large-scale solar plants, where traditional inspection methods have proven to be time-consuming and costly. Currently, the predominant approach to PV inspection using unmanned aerial vehicles (UAVs) is based on photogrammetry. However, the photogrammetry approach presents limitations, such as an increased amount of useless data during flights, potential issues related to image resolution, and the detection process during high-altitude flights. In this work, we develop a visual servoing control system applied to a UAV with dynamic compensation using a nonlinear model predictive control (NMPC) capable of accurately tracking the middle of the underlying PV array at different frontal velocities and height constraints, ensuring the acquisition of detailed images during low-altitude flights. The visual servoing controller is based on the extraction of features using RGB-D images and the Kalman filter to estimate the edges of the PV arrays. Furthermore, this work demonstrates the proposal in both simulated and real-world environments using the commercial aerial vehicle (DJI Matrice 100), with the purpose of showcasing the results of the architecture. Our approach is available for the scientific community in: https://github.com/EPVelasco/VisualServoing_NMPC
△ Less
Submitted 10 February, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
LiLO: Lightweight and low-bias LiDAR Odometry method based on spherical range image filtering
Authors:
Edison P. Velasco-Sánchez,
Miguel Ángel Muñoz-Bañón,
Francisco A. Candelas,
Santiago T. Puente,
Fernando Torres
Abstract:
In unstructured outdoor environments, robotics requires accurate and efficient odometry with low computational time. Existing low-bias LiDAR odometry methods are often computationally expensive. To address this problem, we present a lightweight LiDAR odometry method that converts unorganized point cloud data into a spherical range image (SRI) and filters out surface, edge, and ground features in t…
▽ More
In unstructured outdoor environments, robotics requires accurate and efficient odometry with low computational time. Existing low-bias LiDAR odometry methods are often computationally expensive. To address this problem, we present a lightweight LiDAR odometry method that converts unorganized point cloud data into a spherical range image (SRI) and filters out surface, edge, and ground features in the image plane. This substantially reduces computation time and the required features for odometry estimation in LOAM-based algorithms. Our odometry estimation method does not rely on global maps or loop closure algorithms, which further reduces computational costs. Experimental results generate a translation and rotation error of 0.86\% and 0.0036°/m on the KITTI dataset with an average runtime of 78ms. In addition, we tested the method with our data, obtaining an average closed-loop error of 0.8m and a runtime of 27ms over eight loops covering 3.5Km.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers
Authors:
Edison P. Velasco Sánchez,
Miguel Ángel Muñoz-Bañón,
Francisco A. Candelas,
Santiago T. Puente,
Fernando Torres
Abstract:
While Visual Servoing is deeply studied to perform simple maneuvers, the literature does not commonly address complex cases where the target is far out of the camera's field of view (FOV) during the maneuver. For this reason, in this paper, we present ViKi-HyCo (Visual Servoing and Kinematic Hybrid-Controller). This approach generates the necessary maneuvers for the complex positioning of a non-ho…
▽ More
While Visual Servoing is deeply studied to perform simple maneuvers, the literature does not commonly address complex cases where the target is far out of the camera's field of view (FOV) during the maneuver. For this reason, in this paper, we present ViKi-HyCo (Visual Servoing and Kinematic Hybrid-Controller). This approach generates the necessary maneuvers for the complex positioning of a non-holonomic mobile robot in outdoor environments. In this method, we use \hbox{LiDAR-camera} fusion to estimate objects bounding boxes using image and metrics modalities. With the multi-modality nature of our representation, we can automatically obtain a target for a visual servoing controller. At the same time, we also have a metric target, which allows us to hybridize with a kinematic controller. Given this hybridization, we can perform complex maneuvers even when the target is far away from the camera's FOV. The proposed approach does not require an object-tracking algorithm and can be applied to any robotic positioning task where its kinematic model is known. ViKi-HyCo has an error of 0.0428 \pm 0.0467 m in the X-axis and 0.0515 \pm 0.0323 m in the Y-axis at the end of a complete positioning task.
△ Less
Submitted 16 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Detection and depth estimation for domestic waste in outdoor environments by sensors fusion
Authors:
Ignacio de L. Páez-Ubieta,
Edison Velasco-Sánchez,
Santiago T. Puente,
Francisco A. Candelas
Abstract:
In this work, we estimate the depth in which domestic waste are located in space from a mobile robot in outdoor scenarios. As we are doing this calculus on a broad range of space (0.3 - 6.0 m), we use RGB-D camera and LiDAR fusion. With this aim and range, we compare several methods such as average, nearest, median and center point, applied to those which are inside a reduced or non-reduced Boundi…
▽ More
In this work, we estimate the depth in which domestic waste are located in space from a mobile robot in outdoor scenarios. As we are doing this calculus on a broad range of space (0.3 - 6.0 m), we use RGB-D camera and LiDAR fusion. With this aim and range, we compare several methods such as average, nearest, median and center point, applied to those which are inside a reduced or non-reduced Bounding Box (BB). These BB are obtained from segmentation and detection methods which are representative of these techniques like Yolact, SOLO, You Only Look Once (YOLO)v5, YOLOv6 and YOLOv7. Results shown that, applying a detection method with the average technique and a reduction of BB of 40%, returns the same output as segmenting the object and applying the average method. Indeed, the detection method is faster and lighter in comparison with the segmentation one. The committed median error in the conducted experiments was 0.0298 ${\pm}$ 0.0544 m.
△ Less
Submitted 7 February, 2024; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Domestic waste detection and grasping points for robotic picking up
Authors:
Victor De Gea,
Santiago T. Puente,
Pablo Gil
Abstract:
This paper presents an AI system applied to location and robotic grasping. Experimental setup is based on a parameter study to train a deep-learning network based on Mask-RCNN to perform waste location in indoor and outdoor environment, using five different classes and generating a new waste dataset. Initially the AI system obtain the RGBD data of the environment, followed by the detection of obje…
▽ More
This paper presents an AI system applied to location and robotic grasping. Experimental setup is based on a parameter study to train a deep-learning network based on Mask-RCNN to perform waste location in indoor and outdoor environment, using five different classes and generating a new waste dataset. Initially the AI system obtain the RGBD data of the environment, followed by the detection of objects using the neural network. Later, the 3D object shape is computed using the network result and the depth channel. Finally, the shape is used to compute grasping for a robot arm with a two-finger gripper. The objective is to classify the waste in groups to improve a recycling strategy.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Open-Ended Visual Question-Answering
Authors:
Issey Masuda,
Santiago Pascual de la Puente,
Xavier Giro-i-Nieto
Abstract:
This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the V…
▽ More
This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
△ Less
Submitted 9 October, 2016;
originally announced October 2016.