-
Hierarchical Visual Localization Based on Sparse Feature Pyramid for Adaptive Reduction of Keypoint Map Size
Authors:
Andrei Potapov,
Mikhail Kurenkov,
Pavel Karpyshev,
Evgeny Yudin,
Alena Savinykh,
Evgeny Kruzhkov,
Dzmitry Tsetserukou
Abstract:
Visual localization is a fundamental task for a wide range of applications in the field of robotics. Yet, it is still a complex problem with no universal solution, and the existing approaches are difficult to scale: most state-of-the-art solutions are unable to provide accurate localization without a significant amount of storage space. We propose a hierarchical, low-memory approach to localizatio…
▽ More
Visual localization is a fundamental task for a wide range of applications in the field of robotics. Yet, it is still a complex problem with no universal solution, and the existing approaches are difficult to scale: most state-of-the-art solutions are unable to provide accurate localization without a significant amount of storage space. We propose a hierarchical, low-memory approach to localization based on keypoints with different descriptor lengths. It becomes possible with the use of the developed unsupervised neural network, which predicts a feature pyramid with different descriptor lengths for images. This structure allows applying coarse-to-fine paradigms for localization based on keypoint map, and varying the accuracy of localization by changing the type of the descriptors used in the pipeline. Our approach achieves comparable results in localization accuracy and a significant reduction in memory consumption (up to 16 times) among state-of-the-art methods.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
MeSLAM: Memory Efficient SLAM based on Neural Fields
Authors:
Evgenii Kruzhkov,
Alena Savinykh,
Pavel Karpyshev,
Mikhail Kurenkov,
Evgeny Yudin,
Andrei Potapov,
Dzmitry Tsetserukou
Abstract:
Existing Simultaneous Localization and Mapping (SLAM) approaches are limited in their scalability due to growing map size in long-term robot operation. Moreover, processing such maps for localization and planning tasks leads to the increased computational resources required onboard. To address the problem of memory consumption in long-term operation, we develop a novel real-time SLAM algorithm, Me…
▽ More
Existing Simultaneous Localization and Mapping (SLAM) approaches are limited in their scalability due to growing map size in long-term robot operation. Moreover, processing such maps for localization and planning tasks leads to the increased computational resources required onboard. To address the problem of memory consumption in long-term operation, we develop a novel real-time SLAM algorithm, MeSLAM, that is based on neural field implicit map representation. It combines the proposed global mapping strategy, including neural networks distribution and region tracking, with an external odometry system. As a result, the algorithm is able to efficiently train multiple networks representing different map regions and track poses accurately in large-scale environments. Experimental results show that the accuracy of the proposed approach is comparable to the state-of-the-art methods (on average, 6.6 cm on TUM RGB-D sequences) and outperforms the baseline, iMAP$^*$. Moreover, the proposed SLAM approach provides the most compact-sized maps without details distortion (1.9 MB to store 57 m$^3$) among the state-of-the-art SLAM approaches.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
MuCaSLAM: CNN-Based Frame Quality Assessment for Mobile Robot with Omnidirectional Visual SLAM
Authors:
Pavel Karpyshev,
Evgeny Kruzhkov,
Evgeny Yudin,
Alena Savinykh,
Andrei Potapov,
Mikhail Kurenkov,
Anton Kolomeytsev,
Ivan Kalinov,
Dzmitry Tsetserukou
Abstract:
In the proposed study, we describe an approach to improving the computational efficiency and robustness of visual SLAM algorithms on mobile robots with multiple cameras and limited computational power by implementing an intermediate layer between the cameras and the SLAM pipeline. In this layer, the images are classified using a ResNet18-based neural network regarding their applicability to the ro…
▽ More
In the proposed study, we describe an approach to improving the computational efficiency and robustness of visual SLAM algorithms on mobile robots with multiple cameras and limited computational power by implementing an intermediate layer between the cameras and the SLAM pipeline. In this layer, the images are classified using a ResNet18-based neural network regarding their applicability to the robot localization. The network is trained on a six-camera dataset collected in the campus of the Skolkovo Institute of Science and Technology (Skoltech). For training, we use the images and ORB features that were successfully matched with subsequent frame of the same camera ("good" keypoints or features). The results have shown that the network is able to accurately determine the optimal images for ORB-SLAM2, and implementing the proposed approach in the SLAM pipeline can help significantly increase the number of images the SLAM algorithm can localize on, and improve the overall robustness of visual SLAM. The experiments on operation time state that the proposed approach is at least 6 times faster compared to using ORB extractor and feature matcher when operated on CPU, and more than 30 times faster when run on GPU. The network evaluation has shown at least 90% accuracy in recognizing images with a big number of "good" ORB keypoints. The use of the proposed approach allowed to maintain a high number of features throughout the dataset by robustly switching from cameras with feature-poor streams.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
CloudVision: DNN-based Visual Localization of Autonomous Robots using Prebuilt LiDAR Point Cloud
Authors:
Evgeny Yudin,
Pavel Karpyshev,
Mikhail Kurenkov,
Alena Savinykh,
Andrei Potapov,
Evgeny Kruzhkov,
Dzmitry Tsetserukou
Abstract:
In this study, we propose a novel visual localization approach to accurately estimate six degrees of freedom (6-DoF) poses of the robot within the 3D LiDAR map based on visual data from an RGB camera. The 3D map is obtained utilizing an advanced LiDAR-based simultaneous localization and mapping (SLAM) algorithm capable of collecting a precise sparse map. The features extracted from the camera imag…
▽ More
In this study, we propose a novel visual localization approach to accurately estimate six degrees of freedom (6-DoF) poses of the robot within the 3D LiDAR map based on visual data from an RGB camera. The 3D map is obtained utilizing an advanced LiDAR-based simultaneous localization and mapping (SLAM) algorithm capable of collecting a precise sparse map. The features extracted from the camera images are compared with the points of the 3D map, and then the geometric optimization problem is being solved to achieve precise visual localization. Our approach allows employing a scout robot equipped with an expensive LiDAR only once - for mapping of the environment, and multiple operational robots with only RGB cameras onboard - for performing mission tasks, with the localization accuracy higher than common camera-based solutions. The proposed method was tested on the custom dataset collected in the Skolkovo Institute of Science and Technology (Skoltech). During the process of assessing the localization accuracy, we managed to achieve centimeter-level accuracy; the median translation error was as low as 1.3 cm. The precise positioning achieved with only cameras makes possible the usage of autonomous mobile robots to solve the most complex tasks that require high localization accuracy.
△ Less
Submitted 4 September, 2022;
originally announced September 2022.
-
DarkSLAM: GAN-assisted Visual SLAM for Reliable Operation in Low-light Conditions
Authors:
Alena Savinykh,
Mikhail Kurenkov,
Evgeny Kruzhkov,
Evgeny Yudin,
Andrei Potapov,
Pavel Karpyshev,
Dzmitry Tsetserukou
Abstract:
Existing visual SLAM approaches are sensitive to illumination, with their precision drastically falling in dark conditions due to feature extractor limitations. The algorithms currently used to overcome this issue are not able to provide reliable results due to poor performance and noisiness, and the localization quality in dark conditions is still insufficient for practical use. In this paper, we…
▽ More
Existing visual SLAM approaches are sensitive to illumination, with their precision drastically falling in dark conditions due to feature extractor limitations. The algorithms currently used to overcome this issue are not able to provide reliable results due to poor performance and noisiness, and the localization quality in dark conditions is still insufficient for practical use. In this paper, we present a novel SLAM method capable of working in low light using Generative Adversarial Network (GAN) preprocessing module to enhance the light conditions on input images, thus improving the localization robustness. The proposed algorithm was evaluated on a custom indoor dataset consisting of 14 sequences with varying illumination levels and ground truth data collected using a motion capture system. According to the experimental results, the reliability of the proposed approach remains high even in extremely low light conditions, providing 25.1% tracking time on darkest sequences, whereas existing approaches achieve tracking only 0.6% of the sequence time.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Multi-sensor large-scale dataset for multi-view 3D reconstruction
Authors:
Oleg Voynov,
Gleb Bobrovskikh,
Pavel Karpyshev,
Saveliy Galochkin,
Andrei-Timotei Ardelean,
Arseniy Bozhenko,
Ekaterina Karmanova,
Pavel Kopanev,
Yaroslav Labutin-Rymsho,
Ruslan Rakhimov,
Aleksandr Safin,
Valerii Serpiva,
Alexey Artemov,
Evgeny Burnaev,
Dzmitry Tsetserukou,
Denis Zorin
Abstract:
We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide arou…
▽ More
We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms and for related tasks. The dataset is available at skoltech3d.appliedai.tech.
△ Less
Submitted 28 March, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
CNN-based Omnidirectional Object Detection for HermesBot Autonomous Delivery Robot with Preliminary Frame Classification
Authors:
Saian Protasov,
Pavel Karpyshev,
Ivan Kalinov,
Pavel Kopanev,
Nikita Mikhailovskiy,
Alexander Sedunin,
Dzmitry Tsetserukou
Abstract:
Mobile autonomous robots include numerous sensors for environment perception. Cameras are an essential tool for robot's localization, navigation, and obstacle avoidance. To process a large flow of data from the sensors, it is necessary to optimize algorithms, or to utilize substantial computational power. In our work, we propose an algorithm for optimizing a neural network for object detection usi…
▽ More
Mobile autonomous robots include numerous sensors for environment perception. Cameras are an essential tool for robot's localization, navigation, and obstacle avoidance. To process a large flow of data from the sensors, it is necessary to optimize algorithms, or to utilize substantial computational power. In our work, we propose an algorithm for optimizing a neural network for object detection using preliminary binary frame classification. An autonomous outdoor mobile robot with 6 rolling-shutter cameras on the perimeter providing a 360-degree field of view was used as the experimental setup. The obtained experimental results revealed that the proposed optimization accelerates the inference time of the neural network in the cases with up to 5 out of 6 cameras containing target objects.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
DeepScanner: a Robotic System for Automated 2D Object Dataset Collection with Annotations
Authors:
Valery Ilin,
Ivan Kalinov,
Pavel Karpyshev,
Dzmitry Tsetserukou
Abstract:
In the proposed study, we describe the possibility of automated dataset collection using an articulated robot. The proposed technology reduces the number of pixel errors on a polygonal dataset and the time spent on manual labeling of 2D objects. The paper describes a novel automatic dataset collection and annotation system, and compares the results of automated and manual dataset labeling. Our app…
▽ More
In the proposed study, we describe the possibility of automated dataset collection using an articulated robot. The proposed technology reduces the number of pixel errors on a polygonal dataset and the time spent on manual labeling of 2D objects. The paper describes a novel automatic dataset collection and annotation system, and compares the results of automated and manual dataset labeling. Our approach increases the speed of data labeling 240-fold, and improves the accuracy compared to manual labeling 13-fold. We also present a comparison of metrics for training a neural network on a manually annotated and an automatically collected dataset.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Recognition of Russian traffic signs in winter conditions. Solutions of the "Ice Vision" competition winners
Authors:
Artem L. Pavlov,
Azat Davletshin,
Alexey Kharlamov,
Maksim S. Koriukin,
Artem Vasenin,
Pavel Solovev,
Pavel Ostyakov,
Pavel A. Karpyshev,
George V. Ovchinnikov,
Ivan V. Oseledets,
Dzmitry Tsetserukou
Abstract:
With the advancements of various autonomous car projects aiming to achieve SAE Level 5, real-time detection of traffic signs in real-life scenarios has become a highly relevant problem for the industry. Even though a great progress has been achieved in this field, there is still no clear consensus on what the state-of-the-art in this field is.
Moreover, it is important to develop and test system…
▽ More
With the advancements of various autonomous car projects aiming to achieve SAE Level 5, real-time detection of traffic signs in real-life scenarios has become a highly relevant problem for the industry. Even though a great progress has been achieved in this field, there is still no clear consensus on what the state-of-the-art in this field is.
Moreover, it is important to develop and test systems in various regions and conditions. This is why the "Ice Vision" competition has focused on the detection of Russian traffic signs in winter conditions. The IceVisionSet dataset used for this competition features real-world collection of lossless frame sequences with traffic sign annotations. The sequences were collected in varying conditions, including: different weather, camera exposure, illumination and moving speeds.
In this work we describe the competition and present the solutions of the 3 top teams.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.