Search | arXiv e-print repository

arXiv:2412.20612 [pdf, other]

Towards Explaining Uncertainty Estimates in Point Cloud Registration

Authors: Ziyuan Qin, Jongseok Lee, Rudolph Triebel

Abstract: Iterative Closest Point (ICP) is a commonly used algorithm to estimate transformation between two point clouds. The key idea of this work is to leverage recent advances in explainable AI for probabilistic ICP methods that provide uncertainty estimates. Concretely, we propose a method that can explain why a probabilistic ICP method produced a particular output. Our method is based on kernel SHAP (S… ▽ More Iterative Closest Point (ICP) is a commonly used algorithm to estimate transformation between two point clouds. The key idea of this work is to leverage recent advances in explainable AI for probabilistic ICP methods that provide uncertainty estimates. Concretely, we propose a method that can explain why a probabilistic ICP method produced a particular output. Our method is based on kernel SHAP (SHapley Additive exPlanations). With this, we assign an importance value to common sources of uncertainty in ICP such as sensor noise, occlusion, and ambiguous environments. The results of the experiment show that this explanation method can reasonably explain the uncertainty sources, providing a step towards robots that know when and why they failed in a human interpretable manner △ Less

Submitted 29 December, 2024; originally announced December 2024.

arXiv:2412.07565 [pdf, other]

Making the Flow Glow -- Robot Perception under Severe Lighting Conditions using Normalizing Flow Gradients

Authors: Simon Kristoffersson Lind, Rudolph Triebel, Volker Krüger

Abstract: Modern robotic perception is highly dependent on neural networks. It is well known that neural network-based perception can be unreliable in real-world deployment, especially in difficult imaging conditions. Out-of-distribution detection is commonly proposed as a solution for ensuring reliability in real-world deployment. Previous work has shown that normalizing flow models can be used for out-of-… ▽ More Modern robotic perception is highly dependent on neural networks. It is well known that neural network-based perception can be unreliable in real-world deployment, especially in difficult imaging conditions. Out-of-distribution detection is commonly proposed as a solution for ensuring reliability in real-world deployment. Previous work has shown that normalizing flow models can be used for out-of-distribution detection to improve reliability of robotic perception tasks. Specifically, camera parameters can be optimized with respect to the likelihood output from a normalizing flow, which allows a perception system to adapt to difficult vision scenarios. With this work we propose to use the absolute gradient values from a normalizing flow, which allows the perception system to optimize local regions rather than the whole image. By setting up a table top picking experiment with exceptionally difficult lighting conditions, we show that our method achieves a 60% higher success rate for an object detection task compared to previous methods. △ Less

Submitted 10 December, 2024; originally announced December 2024.

arXiv:2412.02386 [pdf, other]

Single-Shot Metric Depth from Focused Plenoptic Cameras

Authors: Blanca Lasheras-Hernandez, Klaus H. Strobl, Sergio Izquierdo, Tim Bodenmüller, Rudolph Triebel, Javier Civera

Abstract: Metric depth estimation from visual sensors is crucial for robots to perceive, navigate, and interact with their environment. Traditional range imaging setups, such as stereo or structured light cameras, face hassles including calibration, occlusions, and hardware demands, with accuracy limited by the baseline between cameras. Single- and multi-view monocular depth offers a more compact alternativ… ▽ More Metric depth estimation from visual sensors is crucial for robots to perceive, navigate, and interact with their environment. Traditional range imaging setups, such as stereo or structured light cameras, face hassles including calibration, occlusions, and hardware demands, with accuracy limited by the baseline between cameras. Single- and multi-view monocular depth offers a more compact alternative, but is constrained by the unobservability of the metric scale. Light field imaging provides a promising solution for estimating metric depth by using a unique lens configuration through a single device. However, its application to single-view dense metric depth is under-addressed mainly due to the technology's high cost, the lack of public benchmarks, and proprietary geometrical models and software. Our work explores the potential of focused plenoptic cameras for dense metric depth. We propose a novel pipeline that predicts metric depth from a single plenoptic camera shot by first generating a sparse metric point cloud using machine learning, which is then used to scale and align a dense relative depth map regressed by a foundation depth model, resulting in dense metric depth. To validate it, we curated the Light Field & Stereo Image Dataset (LFS) of real-world light field images with stereo depth labels, filling a current gap in existing resources. Experimental results show that our pipeline produces accurate metric depth predictions, laying a solid groundwork for future research in this field. △ Less

Submitted 12 March, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

Comments: 8 pages (6 for text + 2 for references), 6 figures, 2 tables. Accepted at IEEE ICRA 2025

ACM Class: I.4.8; I.2.9; I.2.10

arXiv:2410.15766 [pdf, other]

How Important are Data Augmentations to Close the Domain Gap for Object Detection in Orbit?

Authors: Maximilian Ulmer, Leonard Klüpfel, Maximilian Durner, Rudolph Triebel

Abstract: We investigate the efficacy of data augmentations to close the domain gap in spaceborne computer vision, crucial for autonomous operations like on-orbit servicing. As the use of computer vision in space increases, challenges such as hostile illumination and low signal-to-noise ratios significantly hinder performance. While learning-based algorithms show promising results, their adoption is limited… ▽ More We investigate the efficacy of data augmentations to close the domain gap in spaceborne computer vision, crucial for autonomous operations like on-orbit servicing. As the use of computer vision in space increases, challenges such as hostile illumination and low signal-to-noise ratios significantly hinder performance. While learning-based algorithms show promising results, their adoption is limited by the need for extensive annotated training data and the domain gap that arises from differences between synthesized and real-world imagery. This study explores domain generalization in terms of data augmentations -- classical color and geometric transformations, corruptions, and noise -- to enhance model performance across the domain gap. To this end, we conduct an large scale experiment using a hyperparameter optimization pipeline that samples hundreds of different configurations and searches for the best set to bridge the domain gap. As a reference task, we use 2D object detection and evaluate on the SPEED+ dataset that contains real hardware-in-the-loop satellite images in its test set. Moreover, we evaluate four popular object detectors, including Mask R-CNN, Faster R-CNN, YOLO-v7, and the open set detector GroundingDINO, and highlight their trade-offs between performance, inference speed, and training time. Our results underscore the vital role of data augmentations in bridging the domain gap, improving model performance, robustness, and reliability for critical space applications. As a result, we propose two novel data augmentations specifically developed to emulate the visual effects observed in orbital imagery. We conclude by recommending the most effective augmentations for advancing computer vision in challenging orbital environments. Code for training detectors and hyperparameter search will be made publicly available. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2409.12339 [pdf, other]

A Learning-based Controller for Multi-Contact Grasps on Unknown Objects with a Dexterous Hand

Authors: Dominik Winkelbauer, Rudolph Triebel, Berthold Bäuml

Abstract: Existing grasp controllers usually either only support finger-tip grasps or need explicit configuration of the inner forces. We propose a novel grasp controller that supports arbitrary grasp types, including power grasps with multi-contacts, while operating self-contained on before unseen objects. No detailed contact information is needed, but only a rough 3D model, e.g., reconstructed from a sing… ▽ More Existing grasp controllers usually either only support finger-tip grasps or need explicit configuration of the inner forces. We propose a novel grasp controller that supports arbitrary grasp types, including power grasps with multi-contacts, while operating self-contained on before unseen objects. No detailed contact information is needed, but only a rough 3D model, e.g., reconstructed from a single depth image. First, the external wrench being applied to the object is estimated by using the measured torques at the joints. Then, the torques necessary to counteract the estimated wrench while keeping the object at its initial pose are predicted. The torques are commanded via desired joint angles to an underlying joint-level impedance controller. To reach real-time performance, we propose a learning-based approach that is based on a wrench estimator- and a torque predictor neural network. Both networks are trained in a supervised fashion using data generated via the analytical formulation of the controller. In an extensive simulation-based evaluation, we show that our controller is able to keep 83.1% of the tested grasps stable when applying external wrenches with up to 10N. At the same time, we outperform the two tested baselines by being more efficient and inducing less involuntary object movement. Finally, we show that the controller also works on the real DLR-Hand II, reaching a cycle time of 6ms. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 7 pages, 10 figures, Accepted at International Conference on Intelligent Robots and Systems (IROS) 2024

arXiv:2407.15161 [pdf, other]

FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection

Authors: Qian Feng, Jianxiang Feng, Zhaopeng Chen, Rudolph Triebel, Alois Knoll

Abstract: Synthesizing diverse dexterous grasps from uncertain partial observation is an important yet challenging task for physically intelligent embodiments. Previous works on generative grasp synthesis fell short of precisely capturing the complex grasp distribution and reasoning about shape uncertainty in the unstructured and often partially perceived reality. In this work, we introduce a novel model th… ▽ More Synthesizing diverse dexterous grasps from uncertain partial observation is an important yet challenging task for physically intelligent embodiments. Previous works on generative grasp synthesis fell short of precisely capturing the complex grasp distribution and reasoning about shape uncertainty in the unstructured and often partially perceived reality. In this work, we introduce a novel model that can generate diverse grasps for a multi-fingered hand while introspectively handling perceptual uncertainty and recognizing unknown object geometry to avoid performance degradation. Specifically, we devise a Deep Latent Variable Model (DLVM) based on Normalizing Flows (NFs), facilitating hierarchical and expressive latent representation for modeling versatile grasps. Our model design counteracts typical pitfalls of its popular alternative in generative grasping, i.e., conditional Variational Autoencoders (cVAEs) whose performance is limited by mode collapse and miss-specified prior issues. Moreover, the resultant feature hierarchy and the exact flow likelihood computation endow our model with shape-aware introspective capabilities, enabling it to quantify the shape uncertainty of partial point clouds and detect objects of novel geometry. We further achieve performance gain by fusing this information with a discriminative grasp evaluator, facilitating a novel hybrid way for grasp evaluation. Comprehensive simulated and real-world experiments show that the proposed idea gains superior performance and higher run-time efficiency against strong baselines, including diffusion models. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world. △ Less

Submitted 18 December, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

Comments: First two authors contributed equally, whose ordering decided via coin-tossing. Under Reivew

arXiv:2403.13395 [pdf, other]

doi 10.1109/ICRA57147.2024.10611563

Unifying Local and Global Multimodal Features for Place Recognition in Aliased and Low-Texture Environments

Authors: Alberto García-Hernández, Riccardo Giubilato, Klaus H. Strobl, Javier Civera, Rudolph Triebel

Abstract: Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes… ▽ More Perceptual aliasing and weak textures pose significant challenges to the task of place recognition, hindering the performance of Simultaneous Localization and Mapping (SLAM) systems. This paper presents a novel model, called UMF (standing for Unifying Local and Global Multimodal Features) that 1) leverages multi-modality by cross-attention blocks between vision and LiDAR features, and 2) includes a re-ranking stage that re-orders based on local feature matching the top-k candidates retrieved using a global representation. Our experiments, particularly on sequences captured on a planetary-analogous environment, show that UMF outperforms significantly previous baselines in those challenging aliased environments. Since our work aims to enhance the reliability of SLAM in all situations, we also explore its performance on the widely used RobotCar dataset, for broader applicability. Code and models are available at https://github.com/DLR-RM/UMF △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted submission to International Conference on Robotics and Automation (ICRA), 2024

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2402.07642 [pdf, other]

A Flow-based Credibility Metric for Safety-critical Pedestrian Detection

Authors: Maria Lyssenko, Christoph Gladisch, Christian Heinzemann, Matthias Woehrle, Rudolph Triebel

Abstract: Safety is of utmost importance for perception in automated driving (AD). However, a prime safety concern in state-of-the art object detection is that standard evaluation schemes utilize safety-agnostic metrics to argue sufficient detection performance. Hence, it is imperative to leverage supplementary domain knowledge to accentuate safety-critical misdetections during evaluation tasks. To tackle t… ▽ More Safety is of utmost importance for perception in automated driving (AD). However, a prime safety concern in state-of-the art object detection is that standard evaluation schemes utilize safety-agnostic metrics to argue sufficient detection performance. Hence, it is imperative to leverage supplementary domain knowledge to accentuate safety-critical misdetections during evaluation tasks. To tackle the underspecification, this paper introduces a novel credibility metric, called c-flow, for pedestrian bounding boxes. To this end, c-flow relies on a complementary optical flow signal from image sequences and enhances the analyses of safety-critical misdetections without requiring additional labels. We implement and evaluate c-flow with a state-of-the-art pedestrian detector on a large AD dataset. Our analysis demonstrates that c-flow allows developers to identify safety-critical misdetections. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.02986 [pdf, other]

A Safety-Adapted Loss for Pedestrian Detection in Automated Driving

Authors: Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, Rudolph Triebel

Abstract: In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the de… ▽ More In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalizes all misdetections equally irrespective of their criticality. Subsequently, to mitigate the occurrence of critical failure cases, i.e., false negatives, a safety-aware training strategy might be required to enhance the detection performance for critical pedestrians. In this paper, we propose a novel safety-aware loss variation that leverages the estimated per-pedestrian criticality scores during training. We exploit the reachability set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat quantifying the criticality. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-aware loss function mitigates the misdetection of critical pedestrians without sacrificing performance for the general case, i.e., pedestrians outside the safety-critical zone. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2311.06481 [pdf, other]

Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning

Authors: Jianxiang Feng, Jongseok Lee, Simon Geisler, Stephan Gunnemann, Rudolph Triebel

Abstract: To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse im… ▽ More To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse implications. In this work, we circumvent this topological mismatch using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. The proposed method enjoys the merits of wide compatibility with existing learned models without any performance degradation and minimum computation overhead while enhancing OOD detection capabilities. We demonstrate superior results in density estimation and 2D object detection benchmarks in comparison with extensive baselines. Moreover, we showcase the applicability of the method with a real-robot deployment. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: Accepted on CoRL2023

arXiv:2307.07753 [pdf, other]

Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks

Authors: Dominik Schnaus, Jongseok Lee, Daniel Cremers, Rudolph Triebel

Abstract: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained mod… ▽ More In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization. △ Less

Submitted 15 July, 2023; originally announced July 2023.

Comments: Accepted to ICML 2023

arXiv:2307.01317 [pdf, other]

Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly

Authors: Jianxiang Feng, Matan Atad, Ismael Rodríguez, Maximilian Durner, Stephan Günnemann, Rudolph Triebel

Abstract: Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation… ▽ More Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF. △ Less

Submitted 6 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted to the RSS 2023 Robotic Assembly Workshop

arXiv:2305.09293 [pdf, other]

doi 10.1007/978-3-031-31438-4_21

Out-of-Distribution Detection for Adaptive Computer Vision

Authors: Simon Kristoffersson Lind, Rudolph Triebel, Luigi Nardi, Volker Krueger

Abstract: It is well known that computer vision can be unreliable when faced with previously unseen imaging conditions. This paper proposes a method to adapt camera parameters according to a normalizing flow-based out-of-distibution detector. A small-scale study is conducted which shows that adapting camera parameters according to this out-of-distibution detector leads to an average increase of 3 to 4 perce… ▽ More It is well known that computer vision can be unreliable when faced with previously unseen imaging conditions. This paper proposes a method to adapt camera parameters according to a normalizing flow-based out-of-distibution detector. A small-scale study is conducted which shows that adapting camera parameters according to this out-of-distibution detector leads to an average increase of 3 to 4 percentage points in mAP, mAR and F1 performance metrics of a YOLOv4 object detector. As a secondary result, this paper also shows that it is possible to train a normalizing flow model for out-of-distribution detection on the COCO dataset, which is larger and more diverse than most benchmarks for out-of-distibution detectors. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Published in Springer Lecture Notes for Computer Science Vol. 13886 as part of the conference proceedings for Scandinavian Conference on Image Analysis 2023

Journal ref: In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham

arXiv:2303.13241 [pdf, other]

6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics

Authors: Maximilian Ulmer, Maximilian Durner, Martin Sundermeyer, Manuel Stoiber, Rudolph Triebel

Abstract: We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to disca… ▽ More We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition. △ Less

Submitted 31 August, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2303.10135 [pdf, other]

Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning

Authors: Matan Atad, Jianxiang Feng, Ismael Rodríguez, Maximilian Durner, Rudolph Triebel

Abstract: Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility… ▽ More Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. With GRACE, we are able to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data are available at https://github.com/DLR-RM/GRACE. △ Less

Submitted 27 July, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Accepted to IROS 2023. First two authors share equal contribution

arXiv:2302.11458 [pdf, other]

Fusing Visual Appearance and Geometry for Multi-modality 6DoF Object Tracking

Authors: Manuel Stoiber, Mariam Elsayed, Anne E. Reichert, Florian Steidle, Dongheui Lee, Rudolph Triebel

Abstract: In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object… ▽ More In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object surfaces contain local characteristics from text, graphics, and patterns, as well as global differences from distinct materials and colors. To incorporate this visual information, two modalities are developed. For local characteristics, keypoint features are used to minimize distances between points from keyframes and the current image. For global differences, a novel region approach is developed that considers multiple regions on the object surface. In addition, it allows the modeling of external geometries. Experiments on the YCB-Video and OPT datasets demonstrate that our approach ICG+ performs best on both datasets, outperforming both conventional and deep learning-based methods. At the same time, the algorithm is highly efficient and runs at more than 300 Hz. The source code of our tracker is publicly available. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Submitted to IEEE/RSJ International Conference on Intelligent Robots

arXiv:2210.09678 [pdf, other]

Virtual Reality via Object Pose Estimation and Active Learning: Realizing Telepresence Robots with Aerial Manipulation Capabilities

Authors: Jongseok Lee, Ribin Balachandran, Konstantin Kondak, Andre Coelho, Marco De Stefano, Matthias Humt, Jianxiang Feng, Tamim Asfour, Rudolph Triebel

Abstract: This article presents a novel telepresence system for advancing aerial manipulation in dynamic and unstructured environments. The proposed system not only features a haptic device, but also a virtual reality (VR) interface that provides real-time 3D displays of the robot's workspace as well as a haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR, cam… ▽ More This article presents a novel telepresence system for advancing aerial manipulation in dynamic and unstructured environments. The proposed system not only features a haptic device, but also a virtual reality (VR) interface that provides real-time 3D displays of the robot's workspace as well as a haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR, cameras and IMUs are utilized. For processing of the acquired sensory data, pose estimation pipelines are devised for industrial objects of both known and unknown geometries. We further propose an active learning pipeline in order to increase the sample efficiency of a pipeline component that relies on Deep Neural Networks (DNNs) based object detection. All these algorithms jointly address various challenges encountered during the execution of perception tasks in industrial scenarios. In the experiments, exhaustive ablation studies are provided to validate the proposed pipelines. Methodologically, these results commonly suggest how an awareness of the algorithms' own failures and uncertainty (`introspection') can be used tackle the encountered problems. Moreover, outdoor experiments are conducted to evaluate the effectiveness of the overall system in enhancing aerial manipulation capabilities. In particular, with flight campaigns over days and nights, from spring to winter, and with different users and locations, we demonstrate over 70 robust executions of pick-and-place, force application and peg-in-hole tasks with the DLR cable-Suspended Aerial Manipulator (SAM). As a result, we show the viability of the proposed system in future industrial applications. △ Less

Submitted 10 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Accepted to Field Robotics

arXiv:2209.14774 [pdf, other]

RECALL: Rehearsal-free Continual Learning for Object Classification

Authors: Markus Knauer, Maximilian Denninger, Rudolph Triebel

Abstract: Convolutional neural networks show remarkable results in classification but struggle with learning new things on the fly. We present a novel rehearsal-free approach, where a deep neural network is continually learning new unseen object categories without saving any data of prior sequences. Our approach is called RECALL, as the network recalls categories by calculating logits for old categories bef… ▽ More Convolutional neural networks show remarkable results in classification but struggle with learning new things on the fly. We present a novel rehearsal-free approach, where a deep neural network is continually learning new unseen object categories without saving any data of prior sequences. Our approach is called RECALL, as the network recalls categories by calculating logits for old categories before training new ones. These are then used during training to avoid changing the old categories. For each new sequence, a new head is added to accommodate the new categories. To mitigate forgetting, we present a regularization strategy where we replace the classification with a regression. Moreover, for the known categories, we propose a Mahalanobis loss that includes the variances to account for the changing densities between known and unknown categories. Finally, we present a novel dataset for continual learning, especially suited for object recognition on a mobile robot (HOWS-CL-25), including 150,795 synthetic images of 25 household object categories. Our approach RECALL outperforms the current state of the art on CORe50 and iCIFAR-100 and reaches the best performance on HOWS-CL-25. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted as contributed paper at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:2208.01502 [pdf, other]

A Multi-body Tracking Framework - From Rigid Objects to Kinematic Structures

Authors: Manuel Stoiber, Martin Sundermeyer, Wout Boerdijk, Rudolph Triebel

Abstract: Kinematic structures are very common in the real world. They range from simple articulated objects to complex mechanical systems. However, despite their relevance, most model-based 3D tracking methods only consider rigid objects. To overcome this limitation, we propose a flexible framework that allows the extension of existing 6DoF algorithms to kinematic structures. Our approach focuses on method… ▽ More Kinematic structures are very common in the real world. They range from simple articulated objects to complex mechanical systems. However, despite their relevance, most model-based 3D tracking methods only consider rigid objects. To overcome this limitation, we propose a flexible framework that allows the extension of existing 6DoF algorithms to kinematic structures. Our approach focuses on methods that employ Newton-like optimization techniques, which are widely used in object tracking. The framework considers both tree-like and closed kinematic structures and allows a flexible configuration of joints and constraints. To project equations from individual rigid bodies to a multi-body system, Jacobians are used. For closed kinematic chains, a novel formulation that features Lagrange multipliers is developed. In a detailed mathematical proof, we show that our constraint formulation leads to an exact kinematic solution and converges in a single iteration. Based on the proposed framework, we extend ICG, which is a state-of-the-art rigid object tracking algorithm, to multi-body tracking. For the evaluation, we create a highly-realistic synthetic dataset that features a large number of sequences and various robots. Based on this dataset, we conduct a wide variety of experiments that demonstrate the excellent performance of the developed framework and our multi-body tracker. △ Less

Submitted 14 February, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2207.06815 [pdf, other]

doi 10.1109/LRA.2022.3188118

Challenges of SLAM in extremely unstructured environments: the DLR Planetary Stereo, Solid-State LiDAR, Inertial Dataset

Authors: Riccardo Giubilato, Wolfgang Stürzl, Armin Wedler, Rudolph Triebel

Abstract: We present the DLR Planetary Stereo, Solid-State LiDAR, Inertial (S3LI) dataset, recorded on Mt. Etna, Sicily, an environment analogous to the Moon and Mars, using a hand-held sensor suite with attributes suitable for implementation on a space-like mobile rover. The environment is characterized by challenging conditions regarding both the visual and structural appearance: severe visual aliasing po… ▽ More We present the DLR Planetary Stereo, Solid-State LiDAR, Inertial (S3LI) dataset, recorded on Mt. Etna, Sicily, an environment analogous to the Moon and Mars, using a hand-held sensor suite with attributes suitable for implementation on a space-like mobile rover. The environment is characterized by challenging conditions regarding both the visual and structural appearance: severe visual aliasing poses significant limitations to the ability of visual SLAM systems to perform place recognition, while the absence of outstanding structural details, joined with the limited Field-of-View of the utilized Solid-State LiDAR sensor, challenges traditional LiDAR SLAM for the task of pose estimation using point clouds alone. With this data, that covers more than 4 kilometers of travel on soft volcanic slopes, we aim to: 1) provide a tool to expose limitations of state-of-the-art SLAM systems with respect to environments, which are not present in widely available datasets and 2) motivate the development of novel localization and mapping approaches, that rely efficiently on the complementary capabilities of the two sensors. The dataset is accessible at the following url: https://rmc.dlr.de/s3li_dataset △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: RA-L + IROS 2022 Submission, Accepted

arXiv:2203.05334 [pdf, other]

Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects

Authors: Manuel Stoiber, Martin Sundermeyer, Rudolph Triebel

Abstract: Tracking objects in 3D space and predicting their 6DoF pose is an essential task in computer vision. State-of-the-art approaches often rely on object texture to tackle this problem. However, while they achieve impressive results, many objects do not contain sufficient texture, violating the main underlying assumption. In the following, we thus propose ICG, a novel probabilistic tracker that fuses… ▽ More Tracking objects in 3D space and predicting their 6DoF pose is an essential task in computer vision. State-of-the-art approaches often rely on object texture to tackle this problem. However, while they achieve impressive results, many objects do not contain sufficient texture, violating the main underlying assumption. In the following, we thus propose ICG, a novel probabilistic tracker that fuses region and depth information and only requires the object geometry. Our method deploys correspondence lines and points to iteratively refine the pose. We also implement robust occlusion handling to improve performance in real-world settings. Experiments on the YCB-Video, OPT, and Choi datasets demonstrate that, even for textured objects, our approach outperforms the current state of the art with respect to accuracy and robustness. At the same time, ICG shows fast convergence and outstanding efficiency, requiring only 1.3 ms per frame on a single CPU core. Finally, we analyze the influence of individual components and discuss our performance compared to deep learning-based methods. The source code of our tracker is publicly available. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022

arXiv:2202.00765 [pdf, other]

doi 10.1109/LRA.2022.3142905

A Model for Multi-View Residual Covariances based on Perspective Deformation

Authors: Alejandro Fontan, Laura Oliva, Javier Civera, Rudolph Triebel

Abstract: In this work, we derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. The core of our approach is the formulation of the residual covariances as a combination of geometric and photometric noise sources. And our key novel contribution is the derivation of a term modelling how local 2D patches suffer from perspective deformation when imaging 3D surfa… ▽ More In this work, we derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. The core of our approach is the formulation of the residual covariances as a combination of geometric and photometric noise sources. And our key novel contribution is the derivation of a term modelling how local 2D patches suffer from perspective deformation when imaging 3D surfaces around a point. Together, these add up to an efficient and general formulation which not only improves the accuracy of both feature-based and direct methods, but can also be used to estimate more accurate measures of the state entropy and hence better founded point visibility thresholds. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment, improving their accuracy with a negligible overhead. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2110.12715 [pdf, other]

doi 10.1007/s11263-022-01579-8

SRT3D: A Sparse Region-Based 3D Object Tracking Approach for the Real World

Authors: Manuel Stoiber, Martin Pfanne, Klaus H. Strobl, Rudolph Triebel, Alin Albu-Schäffer

Abstract: Region-based methods have become increasingly popular for model-based, monocular 3D tracking of texture-less objects in cluttered scenes. However, while they achieve state-of-the-art results, most methods are computationally expensive, requiring significant resources to run in real-time. In the following, we build on our previous work and develop SRT3D, a sparse region-based approach to 3D object… ▽ More Region-based methods have become increasingly popular for model-based, monocular 3D tracking of texture-less objects in cluttered scenes. However, while they achieve state-of-the-art results, most methods are computationally expensive, requiring significant resources to run in real-time. In the following, we build on our previous work and develop SRT3D, a sparse region-based approach to 3D object tracking that bridges this gap in efficiency. Our method considers image information sparsely along so-called correspondence lines that model the probability of the object's contour location. We thereby improve on the current state of the art and introduce smoothed step functions that consider a defined global and local uncertainty. For the resulting probabilistic formulation, a thorough analysis is provided. Finally, we use a pre-rendered sparse viewpoint model to create a joint posterior probability for the object pose. The function is maximized using second-order Newton optimization with Tikhonov regularization. During the pose estimation, we differentiate between global and local optimization, using a novel approximation for the first-order derivative employed in the Newton method. In multiple experiments, we demonstrate that the resulting algorithm improves the current state of the art both in terms of runtime and quality, performing particularly well for noisy and cluttered images encountered in the real world. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: Submitted to the International Journal of Computer Vision

arXiv:2110.00784 [pdf, other]

Seeking Visual Discomfort: Curiosity-driven Representations for Reinforcement Learning

Authors: Elie Aljalbout, Maximilian Ulmer, Rudolph Triebel

Abstract: Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates… ▽ More Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve sample diversity for state representation learning. Our method enhances the exploration capability of RL algorithms, by taking advantage of the SRL setup. Our experiments show that our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments. These results are most apparent for environments where the baseline methods struggle. Even in simple environments, our method stabilizes the training, reduces the reward variance, and promotes sample efficiency. △ Less

Submitted 2 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.13588

ACM Class: I.2.6; I.2.8; I.2.9; I.2.10; I.5.0

arXiv:2109.13588 [pdf, other]

Making Curiosity Explicit in Vision-based RL

Authors: Elie Aljalbout, Maximilian Ulmer, Rudolph Triebel

Abstract: Vision-based reinforcement learning (RL) is a promising technique to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to an increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstr… ▽ More Vision-based reinforcement learning (RL) is a promising technique to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to an increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve the sample diversity. Our method enhances the exploration capability of the RL algorithms by taking advantage of the SRL setup. Our experiments show that the presented approach outperforms the baseline for all tested environments. These results are most apparent for environments where the baseline method struggles. Even in simple environments, our method stabilizes the training, reduces the reward variance and boosts sample efficiency. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: ICRA workshop on Curious Robots 2021

ACM Class: I.2.6; I.2.8; I.2.9; I.2.10; I.5.0

arXiv:2109.12869 [pdf, other]

Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks

Authors: Jianxiang Feng, Maximilian Durner, Zoltan-Csaba Marton, Ferenc Balint-Benczedi, Rudolph Triebel

Abstract: This work focuses on improving uncertainty estimation in the field of object classification from RGB images and demonstrates its benefits in two robotic applications. We employ a (BNN), and evaluate two practical inference techniques to obtain better uncertainty estimates, namely Concrete Dropout (CDP) and Kronecker-factored Laplace Approximation (LAP). We show a performance increase using more re… ▽ More This work focuses on improving uncertainty estimation in the field of object classification from RGB images and demonstrates its benefits in two robotic applications. We employ a (BNN), and evaluate two practical inference techniques to obtain better uncertainty estimates, namely Concrete Dropout (CDP) and Kronecker-factored Laplace Approximation (LAP). We show a performance increase using more reliable uncertainty estimates as unary potentials within a Conditional Random Field (CRF), which is able to incorporate contextual information as well. Furthermore, the obtained uncertainties are exploited to achieve domain adaptation in a semi-supervised manner, which requires less manual efforts in annotating data. We evaluate our approach on two public benchmark datasets that are relevant for robot perception tasks. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: International Symposium on Robotics Research (ISRR), Hanoi, Vietnam, 2019

arXiv:2109.11547 [pdf, other]

Bayesian Active Learning for Sim-to-Real Robotic Perception

Authors: Jianxiang Feng, Jongseok Lee, Maximilian Durner, Rudolph Triebel

Abstract: While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to resolve with only synthetic data. Therefore, we focus on an efficient acquisition of real data within a Sim-to-Real learning pipeline. Concretely, we employ deep Ba… ▽ More While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to resolve with only synthetic data. Therefore, we focus on an efficient acquisition of real data within a Sim-to-Real learning pipeline. Concretely, we employ deep Bayesian active learning to minimize manual annotation efforts and devise an autonomous learning paradigm to select the data that is considered useful for the human expert to annotate. To achieve this, a Bayesian Neural Network (BNN) object detector providing reliable uncertainty estimates is adapted to infer the informativeness of the unlabeled data. Furthermore, to cope with mis-alignments of the label distribution in uncertainty-based sampling, we develop an effective randomized sampling strategy that performs favorably compared to other complex alternatives. In our experiments on object classification and detection, we show benefits of our approach and provide evidence that labeling efforts can be reduced significantly. Finally, we demonstrate the practical effectiveness of this idea in a grasping task on an assistive robot. △ Less

Submitted 1 August, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: to appear in IROS2022

arXiv:2109.09690 [pdf, other]

Trust Your Robots! Predictive Uncertainty Estimation of Neural Networks with Sparse Gaussian Processes

Authors: Jongseok Lee, Jianxiang Feng, Matthias Humt, Marcus G. Müller, Rudolph Triebel

Abstract: This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learn… ▽ More This paper presents a probabilistic framework to obtain both reliable and fast uncertainty estimates for predictions with Deep Neural Networks (DNNs). Our main contribution is a practical and principled combination of DNNs with sparse Gaussian Processes (GPs). We prove theoretically that DNNs can be seen as a special case of sparse GPs, namely mixtures of GP experts (MoE-GP), and we devise a learning algorithm that brings the derived theory into practice. In experiments from two different robotic tasks -- inverse dynamics of a manipulator and object detection on a micro-aerial vehicle (MAV) -- we show the effectiveness of our approach in terms of predictive uncertainty, improved scalability, and run-time efficiency on a Jetson TX2. We thus argue that our approach can pave the way towards reliable and fast robot learning systems with uncertainty awareness. △ Less

Submitted 21 September, 2021; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: 12 pages, 6 figures and 1 table. Accepted at the 5th Conference on Robot Learning (CORL 2021), London, UK

arXiv:2109.06596 [pdf, other]

doi 10.55417/fr.2022053

GPGM-SLAM: a Robust SLAM System for Unstructured Planetary Environments with Gaussian Process Gradient Maps

Authors: Riccardo Giubilato, Cedric Le Gentil, Mallikarjuna Vayugundla, Martin J. Schuster, Teresa Vidal-Calleja, Rudolph Triebel

Abstract: Simultaneous Localization and Mapping (SLAM) techniques play a key role towards long-term autonomy of mobile robots due to the ability to correct localization errors and produce consistent maps of an environment over time. Contrarily to urban or man-made environments, where the presence of unique objects and structures offer unique cues for localization, the appearance of unstructured natural envi… ▽ More Simultaneous Localization and Mapping (SLAM) techniques play a key role towards long-term autonomy of mobile robots due to the ability to correct localization errors and produce consistent maps of an environment over time. Contrarily to urban or man-made environments, where the presence of unique objects and structures offer unique cues for localization, the appearance of unstructured natural environments is often ambiguous and self-similar, hindering the performances of loop closure detection. In this paper, we present an approach to improve the robustness of place recognition in the context of a submap-based stereo SLAM based on Gaussian Process Gradient Maps (GPGMaps). GPGMaps embed a continuous representation of the gradients of the local terrain elevation by means of Gaussian Process regression and Structured Kernel Interpolation, given solely noisy elevation measurements. We leverage the image-like structure of GPGMaps to detect loop closures using traditional visual features and Bag of Words. GPGMap matching is performed as an SE(2) alignment to establish loop closure constraints within a pose graph. We evaluate the proposed pipeline on a variety of datasets recorded on Mt. Etna, Sicily and in the Morocco desert, respectively Moon- and Mars-like environments, and we compare the localization performances with state-of-the-art approaches for visual SLAM and visual loop closure detection. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: Submission to Field Robotics (www.journalfieldrobotics.org), under review

Journal ref: Field Robotics, Vol. 2, 2022

arXiv:2109.05509 [pdf, other]

Towards Robust Monocular Visual Odometry for Flying Robots on Planetary Missions

Authors: Martin Wudenka, Marcus G. Müller, Nikolaus Demmel, Armin Wedler, Rudolph Triebel, Daniel Cremers, Wolfgang Stürzl

Abstract: In the future, extraterrestrial expeditions will not only be conducted by rovers but also by flying robots. The technical demonstration drone Ingenuity, that just landed on Mars, will mark the beginning of a new era of exploration unhindered by terrain traversability. Robust self-localization is crucial for that. Cameras that are lightweight, cheap and information-rich sensors are already used to… ▽ More In the future, extraterrestrial expeditions will not only be conducted by rovers but also by flying robots. The technical demonstration drone Ingenuity, that just landed on Mars, will mark the beginning of a new era of exploration unhindered by terrain traversability. Robust self-localization is crucial for that. Cameras that are lightweight, cheap and information-rich sensors are already used to estimate the ego-motion of vehicles. However, methods proven to work in man-made environments cannot simply be deployed on other planets. The highly repetitive textures present in the wastelands of Mars pose a huge challenge to descriptor matching based approaches. In this paper, we present an advanced robust monocular odometry algorithm that uses efficient optical flow tracking to obtain feature correspondences between images and a refined keyframe selection criterion. In contrast to most other approaches, our framework can also handle rotation-only motions that are particularly challenging for monocular odometry systems. Furthermore, we present a novel approach to estimate the current risk of scale drift based on a principal component analysis of the relative translation information matrix. This way we obtain an implicit measure of uncertainty. We evaluate the validity of our approach on all sequences of a challenging real-world dataset captured in a Mars-like environment and show that it outperforms state-of-the-art approaches. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: Accepted to IROS 2021. Updated version corresponding to IROS camera-ready. The source code is publicly available at: https://github.com/DLR-RM/granite

arXiv:2107.03342 [pdf, other]

A Survey of Uncertainty in Deep Neural Networks

Authors: Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, Muhammad Shahzad, Wen Yang, Richard Bamler, Xiao Xiang Zhu

Abstract: Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identifi… ▽ More Due to their increasing spread, confidence in neural network predictions became more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over or under confidence. Many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and a variety of approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. A comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and not reducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks, ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for the calibration of neural networks and give an overview of existing baselines and implementations. Different examples from the wide spectrum of challenges in different fields give an idea of the needs and challenges regarding uncertainties in practical applications. Additionally, the practical limitations of current methods for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given. △ Less

Submitted 18 January, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

arXiv:2105.02020 [pdf, other]

Multi-Modal Loop Closing in Unstructured Planetary Environments with Visually Enriched Submaps

Authors: Riccardo Giubilato, Mallikarjuna Vayugundla, Wolfgang Stürzl, Martin J. Schuster, Armin Wedler, Rudolph Triebel

Abstract: Future planetary missions will rely on rovers that can autonomously explore and navigate in unstructured environments. An essential element is the ability to recognize places that were already visited or mapped. In this work, we leverage the ability of stereo cameras to provide both visual and depth information, guiding the search and validation of loop closures from a multi-modal perspective. We… ▽ More Future planetary missions will rely on rovers that can autonomously explore and navigate in unstructured environments. An essential element is the ability to recognize places that were already visited or mapped. In this work, we leverage the ability of stereo cameras to provide both visual and depth information, guiding the search and validation of loop closures from a multi-modal perspective. We propose to augment submaps that are created by aggregating stereo point clouds, with visual keyframes. Point clouds matches are found by comparing CSHOT descriptors and validated by clustering, while visual matches are established by comparing keyframes using Bag-of-Words (BoW) and ORB descriptors. The relative transformations resulting from both keyframe and point cloud matches are then fused to provide pose constraints between submaps in our graph-based SLAM framework. Using the LRU rover, we performed several tests in both an indoor laboratory environment as well as a challenging planetary analog environment on Mount Etna, Italy. These environments consist of areas where either keyframes or point clouds alone failed to provide adequate matches demonstrating the benefit of the proposed multi-modal approach. △ Less

Submitted 14 September, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:2103.14127 [pdf, other]

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes

Authors: Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter Fox

Abstract: Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficientl… ▽ More Grasping unseen objects in unconstrained, cluttered environments is an essential skill for autonomous robotic manipulation. Despite recent progress in full 6-DoF grasp learning, existing approaches often consist of complex sequential pipelines that possess several potential failure points and run-times unsuitable for closed-loop grasping. Therefore, we propose an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene. Our novel grasp representation treats 3D points of the recorded point cloud as potential grasp contacts. By rooting the full 6-DoF grasp pose and width in the observed point cloud, we can reduce the dimensionality of our grasp representation to 4-DoF which greatly facilitates the learning process. Our class-agnostic approach is trained on 17 million simulated grasps and generalizes well to real world sensor data. In a robotic grasping study of unseen objects in structured clutter we achieve over 90% success rate, cutting the failure rate in half compared to a recent state-of-the-art method. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Comments: ICRA 2021. Video of the real world experiments and code are available at https://research.nvidia.com/publication/2021-03_Contact-GraspNet%3A--Efficient

arXiv:2103.06796 [pdf, other]

Unknown Object Segmentation from Stereo Images

Authors: Maximilian Durner, Wout Boerdijk, Martin Sundermeyer, Werner Friedl, Zoltan-Csaba Marton, Rudolph Triebel

Abstract: Although instance-aware perception is a key prerequisite for many autonomous robotic applications, most of the methods only partially solve the problem by focusing solely on known object categories. However, for robots interacting in dynamic and cluttered environments, this is not realistic and severely limits the range of potential applications. Therefore, we propose a novel object instance segme… ▽ More Although instance-aware perception is a key prerequisite for many autonomous robotic applications, most of the methods only partially solve the problem by focusing solely on known object categories. However, for robots interacting in dynamic and cluttered environments, this is not realistic and severely limits the range of potential applications. Therefore, we propose a novel object instance segmentation approach that does not require any semantic or geometric information of the objects beforehand. In contrast to existing works, we do not explicitly use depth data as input, but rely on the insight that slight viewpoint changes, which for example are provided by stereo image pairs, are often sufficient to determine object boundaries and thus to segment objects. Focusing on the versatility of stereo sensors, we employ a transformer-based architecture that maps directly from the pair of input images to the object instances. This has the major advantage that instead of a noisy, and potentially incomplete depth map as an input, on which the segmentation is computed, we use the original image pair to infer the object instances and a dense depth map. In experiments in several different application domains, we show that our Instance Stereo Transformer (INSTR) algorithm outperforms current state-of-the-art methods that are based on depth maps. Training code and pretrained models will be made available. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: 8 pages, 5 figures, 6 tables, code will be made available

arXiv:2011.04539 [pdf, other]

Learning to Localize in New Environments from Synthetic Training Data

Authors: Dominik Winkelbauer, Maximilian Denninger, Rudolph Triebel

Abstract: Most existing approaches for visual localization either need a detailed 3D model of the environment or, in the case of learning-based methods, must be retrained for each new scene. This can either be very expensive or simply impossible for large, unknown environments, for example in search-and-rescue scenarios. Although there are learning-based approaches that operate scene-agnostically, the gener… ▽ More Most existing approaches for visual localization either need a detailed 3D model of the environment or, in the case of learning-based methods, must be retrained for each new scene. This can either be very expensive or simply impossible for large, unknown environments, for example in search-and-rescue scenarios. Although there are learning-based approaches that operate scene-agnostically, the generalization capability of these methods is still outperformed by classical approaches. In this paper, we present an approach that can generalize to new scenes by applying specific changes to the model architecture, including an extended regression part, the use of hierarchical correlation layers, and the exploitation of scale and uncertainty information. Our approach outperforms the 5-point algorithm using SIFT features on equally big images and additionally surpasses all previous learning-based approaches that were trained on different data. It is also superior to most of the approaches that were specifically trained on the respective scenes. We also evaluate our approach in a scenario where only very few reference images are available, showing that under such more realistic conditions our learning-based approach considerably exceeds both existing learning-based and classical methods. △ Less

Submitted 21 June, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 7 pages, 3 figures; in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021

arXiv:2011.03279 [pdf, other]

"What's This?" -- Learning to Segment Unknown Objects from Manipulation Sequences

Authors: Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel

Abstract: We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly inc… ▽ More We present a novel framework for self-supervised grasped object segmentation with a robotic manipulator. Our method successively learns an agnostic foreground segmentation followed by a distinction between manipulator and object solely by observing the motion between consecutive RGB frames. In contrast to previous approaches, we propose a single, end-to-end trainable architecture which jointly incorporates motion cues and semantic knowledge. Furthermore, while the motion of the manipulator and the object are substantial cues for our algorithm, we present means to robustly deal with distraction objects moving in the background, as well as with completely static scenes. Our method neither depends on any visual registration of a kinematic robot or 3D object models, nor on precise hand-eye calibration or any additional sensor data. By extensive experimental evaluation we demonstrate the superiority of our framework and provide detailed insights on its capability of dealing with the aforementioned extreme cases of motion. We also show that training a semantic segmentation network with the automatically labeled data achieves results on par with manually annotated training data. Code and pretrained model are available at https://github.com/DLR-RM/DistinctNet. △ Less

Submitted 17 June, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: 8 pages, 6 figures,in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021

arXiv:2010.16141 [pdf, other]

Bayesian Optimization Meets Laplace Approximation for Robotic Introspection

Authors: Matthias Humt, Jongseok Lee, Rudolph Triebel

Abstract: In robotics, deep learning (DL) methods are used more and more widely, but their general inability to provide reliable confidence estimates will ultimately lead to fragile and unreliable systems. This impedes the potential deployments of DL methods for long-term autonomy. Therefore, in this paper we introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more… ▽ More In robotics, deep learning (DL) methods are used more and more widely, but their general inability to provide reliable confidence estimates will ultimately lead to fragile and unreliable systems. This impedes the potential deployments of DL methods for long-term autonomy. Therefore, in this paper we introduce a scalable Laplace Approximation (LA) technique to make Deep Neural Networks (DNNs) more introspective, i.e. to enable them to provide accurate assessments of their failure probability for unseen test data. In particular, we propose a novel Bayesian Optimization (BO) algorithm to mitigate their tendency of under-fitting the true weight posterior, so that both the calibration and the accuracy of the predictions can be simultaneously optimized. We demonstrate empirically that the proposed BO approach requires fewer iterations for this when compared to random search, and we show that the proposed framework can be scaled up to large datasets and architectures. △ Less

Submitted 30 October, 2020; originally announced October 2020.

Comments: Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020 Long-Term Autonomy Workshop

arXiv:2010.00052 [pdf, other]

DOT: Dynamic Object Tracking for Visual SLAM

Authors: Irene Ballester, Alejandro Fontan, Javier Civera, Klaus H. Strobl, Rudolph Triebel

Abstract: In this paper we present DOT (Dynamic Object Tracking), a front-end that added to existing SLAM systems can significantly improve their robustness and accuracy in highly dynamic environments. DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to allow SLAM systems based on rigid scene models to avoid such image areas in their optimizations.… ▽ More In this paper we present DOT (Dynamic Object Tracking), a front-end that added to existing SLAM systems can significantly improve their robustness and accuracy in highly dynamic environments. DOT combines instance segmentation and multi-view geometry to generate masks for dynamic objects in order to allow SLAM systems based on rigid scene models to avoid such image areas in their optimizations. To determine which objects are actually moving, DOT segments first instances of potentially dynamic objects and then, with the estimated camera motion, tracks such objects by minimizing the photometric reprojection error. This short-term tracking improves the accuracy of the segmentation with respect to other approaches. In the end, only actually dynamic masks are generated. We have evaluated DOT with ORB-SLAM 2 in three public datasets. Our results show that our approach improves significantly the accuracy and robustness of ORB-SLAM 2, especially in highly dynamic scenes. △ Less

Submitted 30 September, 2020; originally announced October 2020.

arXiv:2009.00221 [pdf, other]

Gaussian Process Gradient Maps for Loop-Closure Detection in Unstructured Planetary Environments

Authors: Cedric Le Gentil, Mallikarjuna Vayugundla, Riccardo Giubilato, Wolfgang Stürzl, Teresa Vidal-Calleja, Rudolph Triebel

Abstract: The ability to recognize previously mapped locations is an essential feature for autonomous systems. Unstructured planetary-like environments pose a major challenge to these systems due to the similarity of the terrain. As a result, the ambiguity of the visual appearance makes state-of-the-art visual place recognition approaches less effective than in urban or man-made environments. This paper pre… ▽ More The ability to recognize previously mapped locations is an essential feature for autonomous systems. Unstructured planetary-like environments pose a major challenge to these systems due to the similarity of the terrain. As a result, the ambiguity of the visual appearance makes state-of-the-art visual place recognition approaches less effective than in urban or man-made environments. This paper presents a method to solve the loop closure problem using only spatial information. The key idea is to use a novel continuous and probabilistic representations of terrain elevation maps. Given 3D point clouds of the environment, the proposed approach exploits Gaussian Process (GP) regression with linear operators to generate continuous gradient maps of the terrain elevation information. Traditional image registration techniques are then used to search for potential matches. Loop closures are verified by leveraging both the spatial characteristic of the elevation maps (SE(2) registration) and the probabilistic nature of the GP representation. A submap-based localization and mapping framework is used to demonstrate the validity of the proposed approach. The performance of this pipeline is evaluated and benchmarked using real data from a rover that is equipped with a stereo camera and navigates in challenging, unstructured planetary-like environments in Morocco and on Mt. Etna. △ Less

Submitted 1 September, 2020; originally announced September 2020.

Comments: This work is accepted for presentation at the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Please find IEEE's copyright statement at the bottom of the first page. Cedric Le Gentil and Mallikarjuna Vayugundla share the first authorship of this paper

arXiv:2007.07630 [pdf, other]

Learning Multiplicative Interactions with Bayesian Neural Networks for Visual-Inertial Odometry

Authors: Kashmira Shinde, Jongseok Lee, Matthias Humt, Aydin Sezgin, Rudolph Triebel

Abstract: This paper presents an end-to-end multi-modal learning approach for monocular Visual-Inertial Odometry (VIO), which is specifically designed to exploit sensor complementarity in the light of sensor degradation scenarios. The proposed network makes use of a multi-head self-attention mechanism that learns multiplicative interactions between multiple streams of information. Another design feature of… ▽ More This paper presents an end-to-end multi-modal learning approach for monocular Visual-Inertial Odometry (VIO), which is specifically designed to exploit sensor complementarity in the light of sensor degradation scenarios. The proposed network makes use of a multi-head self-attention mechanism that learns multiplicative interactions between multiple streams of information. Another design feature of our approach is the incorporation of the model uncertainty using scalable Laplace Approximation. We evaluate the performance of the proposed approach by comparing it against the end-to-end state-of-the-art methods on the KITTI dataset and show that it achieves superior performance. Importantly, our work thereby provides an empirical evidence that learning multiplicative interactions can result in a powerful inductive bias for increased robustness to sensor failures. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: Published at Workshop on AI for Autonomous Driving (AIAD), the 37th International Conference on Machine Learning, Vienna, Austria, 2020

arXiv:2006.12456 [pdf, other]

Effective Version Space Reduction for Convolutional Neural Networks

Authors: Jiayu Liu, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers

Abstract: In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two… ▽ More In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We examine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two approaches---prior mass reduction and diameter reduction---and propose a new diameter-based querying method---the minimum Gibbs-vote disagreement. By estimating version space diameter and bias, we illustrate how version space of neural networks evolves and examine the realizability assumption. With experiments on MNIST, Fashion-MNIST, SVHN and STL-10 datasets, we demonstrate that diameter reduction methods reduce the version space more effectively and perform better than prior mass reduction and other baselines, and that the Gibbs vote disagreement is on par with the best query method. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 22 pages, 8 figures, to be published in the Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2020

ACM Class: I.2.6; G.3; I.5.1

arXiv:2006.11631 [pdf, other]

Estimating Model Uncertainty of Neural Networks in Sparse Information Form

Authors: Jongseok Lee, Matthias Humt, Jianxiang Feng, Rudolph Triebel

Abstract: We present a sparse representation of model uncertainty for Deep Neural Networks (DNNs) where the parameter posterior is approximated with an inverse formulation of the Multivariate Normal Distribution (MND), also known as the information form. The key insight of our work is that the information matrix, i.e. the inverse of the covariance matrix tends to be sparse in its spectrum. Therefore, dimens… ▽ More We present a sparse representation of model uncertainty for Deep Neural Networks (DNNs) where the parameter posterior is approximated with an inverse formulation of the Multivariate Normal Distribution (MND), also known as the information form. The key insight of our work is that the information matrix, i.e. the inverse of the covariance matrix tends to be sparse in its spectrum. Therefore, dimensionality reduction techniques such as low rank approximations (LRA) can be effectively exploited. To achieve this, we develop a novel sparsification algorithm and derive a cost-effective analytical sampler. As a result, we show that the information form can be scalably applied to represent model uncertainty in DNNs. Our exhaustive theoretical analysis and empirical evaluations on various benchmarks show the competitiveness of our approach over the current methods. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: Accepted to the Thirty-seventh International Conference on Machine Learning (ICML) 2020

arXiv:2006.08368 [pdf]

Sensor Artificial Intelligence and its Application to Space Systems -- A White Paper

Authors: Anko Börner, Heinz-Wilhelm Hübers, Odej Kao, Florian Schmidt, Sören Becker, Joachim Denzler, Daniel Matolin, David Haber, Sergio Lucia, Wojciech Samek, Rudolph Triebel, Sascha Eichstädt, Felix Biessmann, Anna Kruspe, Peter Jung, Manon Kok, Guillermo Gallego, Ralf Berger

Abstract: Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in t… ▽ More Information and communication technologies have accompanied our everyday life for years. A steadily increasing number of computers, cameras, mobile devices, etc. generate more and more data, but at the same time we realize that the data can only partially be analyzed with classical approaches. The research and development of methods based on artificial intelligence (AI) made enormous progress in the area of interpretability of data in recent years. With growing experience, both, the potential and limitations of these new technologies are increasingly better understood. Typically, AI approaches start with the data from which information and directions for action are derived. However, the circumstances under which such data are collected and how they change over time are rarely considered. A closer look at the sensors and their physical properties within AI approaches will lead to more robust and widely applicable algorithms. This holistic approach which considers entire signal chains from the origin to a data product, "Sensor AI", is a highly relevant topic with great potential. It will play a decisive role in autonomous driving as well as in areas of automated production, predictive maintenance or space research. The goal of this white paper is to establish "Sensor AI" as a dedicated research topic. We want to exchange knowledge on the current state-of-the-art on Sensor AI, to identify synergies among research groups and thus boost the collaboration in this key technology for science and industry. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: 4 pages. 1st Workshop on Sensor Artificial Intelligence, Apr. 2020, Berlin, Germany

arXiv:2006.03486 [pdf, other]

Segmentation of Surgical Instruments for Minimally-Invasive Robot-Assisted Procedures Using Generative Deep Neural Networks

Authors: Iñigo Azqueta-Gavaldon, Florian Fröhlich, Klaus Strobl, Rudolph Triebel

Abstract: This work proves that semantic segmentation on minimally invasive surgical instruments can be improved by using training data that has been augmented through domain adaptation. The benefit of this method is twofold. Firstly, it suppresses the need of manually labeling thousands of images by transforming synthetic data into realistic-looking data. To achieve this, a CycleGAN model is used, which tr… ▽ More This work proves that semantic segmentation on minimally invasive surgical instruments can be improved by using training data that has been augmented through domain adaptation. The benefit of this method is twofold. Firstly, it suppresses the need of manually labeling thousands of images by transforming synthetic data into realistic-looking data. To achieve this, a CycleGAN model is used, which transforms a source dataset to approximate the domain distribution of a target dataset. Secondly, this newly generated data with perfect labels is utilized to train a semantic segmentation neural network, U-Net. This method shows generalization capabilities on data with variability regarding its rotation- position- and lighting conditions. Nevertheless, one of the caveats of this approach is that the model is unable to generalize well to other surgical instruments with a different shape from the one used for training. This is driven by the lack of a high variance in the geometric distribution of the training data. Future work will focus on making the model more scale-invariant and able to adapt to other types of surgical instruments previously unseen by the training. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2003.11509 [pdf, other]

Visual-Inertial Telepresence for Aerial Manipulation

Authors: Jongseok Lee, Ribin Balachandran, Yuri S. Sarkisov, Marco De Stefano, Andre Coelho, Kashmira Shinde, Min Jun Kim, Rudolph Triebel, Konstantin Kondak

Abstract: This paper presents a novel telepresence system for enhancing aerial manipulation capabilities. It involves not only a haptic device, but also a virtual reality that provides a 3D visual feedback to a remotely-located teleoperator in real-time. We achieve this by utilizing onboard visual and inertial sensors, an object tracking algorithm and a pre-generated object database. As the virtual reality… ▽ More This paper presents a novel telepresence system for enhancing aerial manipulation capabilities. It involves not only a haptic device, but also a virtual reality that provides a 3D visual feedback to a remotely-located teleoperator in real-time. We achieve this by utilizing onboard visual and inertial sensors, an object tracking algorithm and a pre-generated object database. As the virtual reality has to closely match the real remote scene, we propose an extension of a marker tracking algorithm with visual-inertial odometry. Both indoor and outdoor experiments show benefits of our proposed system in achieving advanced aerial manipulation tasks, namely grasping, placing, force exertion and peg-in-hole insertion. △ Less

Submitted 20 June, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2020, IEEE copyright, 8 pages, 10 figures

arXiv:2002.04487 [pdf, other]

Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

Authors: Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel

Abstract: Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB vi… ▽ More Accurate object segmentation is a crucial task in the context of robotic manipulation. However, creating sufficient annotated training data for neural networks is particularly time consuming and often requires manual labeling. To this end, we propose a simple, yet robust solution for learning to segment unknown objects grasped by a robot. Specifically, we exploit motion and temporal cues in RGB video sequences. Using optical flow estimation we first learn to predict segmentation masks of our given manipulator. Then, these annotations are used in combination with motion cues to automatically distinguish between background, manipulator and unknown, grasped object. In contrast to existing systems our approach is fully self-supervised and independent of precise camera calibration, 3D models or potentially imperfect depth data. We perform a thorough comparison with alternative baselines and approaches from literature. The object masks and views are shown to be suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D reconstruction. △ Less

Submitted 6 November, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: 15 pages, 11 figures. Video: https://www.youtube.com/watch?v=srEwuuIIgzI

arXiv:1908.00151 [pdf, other]

Multi-path Learning for Object Pose Estimation Across Domains

Authors: Martin Sundermeyer, Maximilian Durner, En Yen Puang, Zoltan-Csaba Marton, Narunas Vaskevicius, Kai O. Arras, Rudolph Triebel

Abstract: We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning":… ▽ More We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches. △ Less

Submitted 3 April, 2020; v1 submitted 31 July, 2019; originally announced August 2019.

Comments: To appear at CVPR 2020; Code will be available here: https://github.com/DLR-RM/AugmentedAutoencoder/tree/multipath

arXiv:1906.04933 [pdf, other]

Non-Parametric Calibration for Classification

Authors: Jonathan Wenger, Hedvig Kjellström, Rudolph Triebel

Abstract: Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that t… ▽ More Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures. △ Less

Submitted 27 February, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1902.01275 [pdf, other]

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

Authors: Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

Abstract: We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalize… ▽ More We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results. △ Less

Submitted 17 July, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: Code available at: https://github.com/DLR-RM/AugmentedAutoencoder

arXiv:1612.03653 [pdf, other]

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Authors: Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that,… ▽ More We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour. △ Less

Submitted 21 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: NIPS workshop on Deep Learning for Action and Interaction, 2016

Showing 1–50 of 50 results for author: Triebel, R