-
Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation
Authors:
Elena Govi,
Davide Sapienza,
Carmelo Scribano,
Tobia Poppi,
Giorgia Franchini,
Paola Ardòn,
Micaela Verucchi,
Marko Bertogna
Abstract:
In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effective…
▽ More
In recent years, there has been a growing trend of using data-driven methods in industrial settings. These kinds of methods often process video images or parts, therefore the integrity of such images is crucial. Sometimes datasets, e.g. consisting of images, can be sophisticated for various reasons. It becomes critical to understand how the manipulation of video and images can impact the effectiveness of a machine learning method. Our case study aims precisely to analyze the Linemod dataset, considered the state of the art in 6D pose estimation context. That dataset presents images accompanied by ArUco markers; it is evident that such markers will not be available in real-world contexts. We analyze how the presence of the markers affects the pose estimation accuracy, and how this bias may be mitigated through data augmentation and other methods. Our work aims to show how the presence of these markers goes to modify, in the testing phase, the effectiveness of the deep learning method used. In particular, we will demonstrate, through the tool of saliency maps, how the focus of the neural network is captured in part by these ArUco markers. Finally, a new dataset, obtained by applying geometric tools to Linemod, will be proposed in order to demonstrate our hypothesis and uncovering the bias. Our results demonstrate the potential for bias in 6DOF pose estimation networks, and suggest methods for reducing this bias when training with markers.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Model-Based Underwater 6D Pose Estimation from RGB
Authors:
Davide Sapienza,
Elena Govi,
Sara Aldhaheri,
Marko Bertogna,
Eloy Roura,
Èric Pairet,
Micaela Verucchi,
Paola Ardón
Abstract:
Object pose estimation underwater allows an autonomous system to perform tracking and intervention tasks. Nonetheless, underwater target pose estimation is remarkably challenging due to, among many factors, limited visibility, light scattering, cluttered environments, and constantly varying water conditions. An approach is to employ sonar or laser sensing to acquire 3D data, however, the data is n…
▽ More
Object pose estimation underwater allows an autonomous system to perform tracking and intervention tasks. Nonetheless, underwater target pose estimation is remarkably challenging due to, among many factors, limited visibility, light scattering, cluttered environments, and constantly varying water conditions. An approach is to employ sonar or laser sensing to acquire 3D data, however, the data is not clear and the sensors expensive. For this reason, the community has focused on extracting pose estimates from RGB input. In this work, we propose an approach that leverages 2D object detection to reliably compute 6D pose estimates in different underwater scenarios. We test our proposal with 4 objects with symmetrical shapes and poor texture spanning across 33,920 synthetic and 10 real scenes. All objects and scenes are made available in an open-source dataset that includes annotations for object detection and pose estimation. When benchmarking against similar end-to-end methodologies for 6D object pose estimation, our pipeline provides estimates that are 8% more accurate. We also demonstrate the real world usability of our pose estimation pipeline on an underwater robotic manipulator in a reaching task.
△ Less
Submitted 15 September, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers
Authors:
Carmelo Scribano,
Davide Sapienza,
Giorgia Franchini,
Micaela Verucchi,
Marko Bertogna
Abstract:
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehic…
▽ More
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.