-
FORM: Fixed-Lag Odometry with Reparative Mapping utilizing Rotating LiDAR Sensors
Authors:
Easton R. Potokar,
Taylor Pool,
Daniel McGann,
Michael Kaess
Abstract:
Light Detection and Ranging (LiDAR) sensors have become a de-facto sensor for many robot state estimation tasks, spurring development of many LiDAR Odometry (LO) methods in recent years. While some smoothing-based LO methods have been proposed, most require matching against multiple scans, resulting in sub-real-time performance. Due to this, most prior works estimate a single state at a time and a…
▽ More
Light Detection and Ranging (LiDAR) sensors have become a de-facto sensor for many robot state estimation tasks, spurring development of many LiDAR Odometry (LO) methods in recent years. While some smoothing-based LO methods have been proposed, most require matching against multiple scans, resulting in sub-real-time performance. Due to this, most prior works estimate a single state at a time and are ``submap''-based. This architecture propagates any error in pose estimation to the fixed submap and can cause jittery trajectories and degrade future registrations. We propose Fixed-Lag Odometry with Reparative Mapping (FORM), a LO method that performs smoothing over a densely connected factor graph while utilizing a single iterative map for matching. This allows for both real-time performance and active correction of the local map as pose estimates are further refined. We evaluate on a wide variety of datasets to show that FORM is robust, accurate, real-time, and provides smooth trajectory estimates when compared to prior state-of-the-art LO methods.
△ Less
Submitted 10 October, 2025;
originally announced October 2025.
-
Atmospheric CO2 Ice in the Martian Polar Regions: Physical and Spectral Properties From Mars Climate Sounder Observations
Authors:
R. W. Stevens,
P. O. Hayne,
A. Kleinböhl,
D. M. Kass
Abstract:
$\text{CO}_{\text{2}}$ ice clouds are important for polar energy balance and the carbon dioxide cycle on Mars. However, uncertainties remain regarding their physical and radiative properties, which control how polar $\text{CO}_{\text{2}}…
▽ More
$\text{CO}_{\text{2}}$ ice clouds are important for polar energy balance and the carbon dioxide cycle on Mars. However, uncertainties remain regarding their physical and radiative properties, which control how polar $\text{CO}_{\text{2}}$ clouds interact with the global Martian climate. Here, we use Mars Climate Sounder (MCS) observations of atmospheric radiance to estimate these physical and radiative properties. We find that Martian $\text{CO}_{\text{2}}$ clouds are typically composed of large particles from a narrow size distribution with an effective radius of 46 $μ$m and an effective variance of $2.0 \times 10^{-3}$ in the southern hemisphere, and an effective radius of 42 $μ$m and an effective variance of $2.0 \times 10^{-3}$ in the north. The similarity in sizes of $\text{CO}_{\text{2}}$ ice particles in both hemispheres may be due to the fact that $\text{CO}_{\text{2}}$ clouds tend to form near the same pressure level in each hemisphere, despite the higher surface pressures in the north. We use a simplified convective cooling model to show that the small effective variance we derive may be a consequence of the fact that $\text{CO}_{\text{2}}$ is also the dominant atmospheric constituent on Mars, which allows $\text{CO}_{\text{2}}$ ice particles to reach sizes upwards of 10 $μ$m within seconds. At the same time, the fact that the Martian atmosphere is so thin means that large particles fall rapidly to the surface, reducing the range of particle sizes that can remain in the atmosphere for any extended period of time. This study is part of ongoing work to add $\text{CO}_{\text{2}}$ ice opacity profiles to the MCS retrieval pipeline.
△ Less
Submitted 2 September, 2025;
originally announced September 2025.
-
COSMO-Bench: A Benchmark for Collaborative SLAM Optimization
Authors:
Daniel McGann,
Easton R. Potokar,
Michael Kaess
Abstract:
Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit gr…
▽ More
Recent years have seen a focus on research into distributed optimization algorithms for multi-robot Collaborative Simultaneous Localization and Mapping (C-SLAM). Research in this domain, however, is made difficult by a lack of standard benchmark datasets. Such datasets have been used to great effect in the field of single-robot SLAM, and researchers focused on multi-robot problems would benefit greatly from dedicated benchmark datasets. To address this gap, we design and release the Collaborative Open-Source Multi-robot Optimization Benchmark (COSMO-Bench) -- a suite of 24 datasets derived from a baseline C-SLAM front-end and real-world LiDAR data. Data DOI: https://doi.org/10.1184/R1/29652158
△ Less
Submitted 12 September, 2025; v1 submitted 22 August, 2025;
originally announced August 2025.
-
GelSLAM: A Real-time, High-Fidelity, and Robust 3D Tactile SLAM System
Authors:
Hung-Jui Huang,
Mohammad Amin Mirzaee,
Michael Kaess,
Wenzhen Yuan
Abstract:
Accurately perceiving an object's pose and shape is essential for precise grasping and manipulation. Compared to common vision-based methods, tactile sensing offers advantages in precision and immunity to occlusion when tracking and reconstructing objects in contact. This makes it particularly valuable for in-hand and other high-precision manipulation tasks. In this work, we present GelSLAM, a rea…
▽ More
Accurately perceiving an object's pose and shape is essential for precise grasping and manipulation. Compared to common vision-based methods, tactile sensing offers advantages in precision and immunity to occlusion when tracking and reconstructing objects in contact. This makes it particularly valuable for in-hand and other high-precision manipulation tasks. In this work, we present GelSLAM, a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity. Unlike traditional point cloud-based approaches, GelSLAM uses tactile-derived surface normals and curvatures for robust tracking and loop closure. It can track object motion in real time with low error and minimal drift, and reconstruct shapes with submillimeter accuracy, even for low-texture objects such as wooden tools. GelSLAM extends tactile sensing beyond local contact to enable global, long-horizon spatial perception, and we believe it will serve as a foundation for many precise manipulation tasks involving interaction with objects in hand. The video demo is available on our website: https://joehjhuang.github.io/gelslam.
△ Less
Submitted 21 August, 2025;
originally announced August 2025.
-
A Comprehensive Evaluation of LiDAR Odometry Techniques
Authors:
Easton Potokar,
Michael Kaess
Abstract:
Light Detection and Ranging (LiDAR) sensors have become the sensor of choice for many robotic state estimation tasks. Because of this, in recent years there has been significant work done to fine the most accurate method to perform state estimation using these sensors. In each of these prior works, an explosion of possible technique combinations has occurred, with each work comparing LiDAR Odometr…
▽ More
Light Detection and Ranging (LiDAR) sensors have become the sensor of choice for many robotic state estimation tasks. Because of this, in recent years there has been significant work done to fine the most accurate method to perform state estimation using these sensors. In each of these prior works, an explosion of possible technique combinations has occurred, with each work comparing LiDAR Odometry (LO) "pipelines" to prior "pipelines". Unfortunately, little work up to this point has performed the significant amount of ablation studies comparing the various building-blocks of a LO pipeline. In this work, we summarize the various techniques that go into defining a LO pipeline and empirically evaluate these LO components on an expansive number of datasets across environments, LiDAR types, and vehicle motions. Finally, we make empirically-backed recommendations for the design of future LO pipelines to provide the most accurate and reliable performance.
△ Less
Submitted 21 July, 2025;
originally announced July 2025.
-
Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation
Authors:
Carolina Higuera,
Akash Sharma,
Taosha Fan,
Chaithanya Krishna Bodduluri,
Byron Boots,
Michael Kaess,
Mike Lambeta,
Tingfan Wu,
Zixi Liu,
Francois Robert Hogan,
Mustafa Mukadam
Abstract:
We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation…
▽ More
We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Self-supervised perception for tactile skin covered dexterous hands
Authors:
Akash Sharma,
Carolina Higuera,
Chaithanya Krishna Bodduluri,
Zixi Liu,
Taosha Fan,
Tess Hellebrekers,
Mike Lambeta,
Byron Boots,
Michael Kaess,
Tingfan Wu,
Francois Robert Hogan,
Mustafa Mukadam
Abstract:
We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is cruci…
▽ More
We present Sparsh-skin, a pre-trained encoder for magnetic skin sensors distributed across the fingertips, phalanges, and palm of a dexterous robot hand. Magnetic tactile skins offer a flexible form factor for hand-wide coverage with fast response times, in contrast to vision-based tactile sensors that are restricted to the fingertips and limited by bandwidth. Full hand tactile perception is crucial for robot dexterity. However, a lack of general-purpose models, challenges with interpreting magnetic flux and calibration have limited the adoption of these sensors. Sparsh-skin, given a history of kinematic and tactile sensing across a hand, outputs a latent tactile embedding that can be used in any downstream task. The encoder is self-supervised via self-distillation on a variety of unlabeled hand-object interactions using an Allegro hand sensorized with Xela uSkin. In experiments across several benchmark tasks, from state estimation to policy learning, we find that pretrained Sparsh-skin representations are both sample efficient in learning downstream tasks and improve task performance by over 41% compared to prior work and over 56% compared to end-to-end learning.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Acoustic Neural 3D Reconstruction Under Pose Drift
Authors:
Tianxiang Lin,
Mohamad Qadri,
Kevin Zhang,
Adithya Pediredla,
Christopher A. Metzler,
Michael Kaess
Abstract:
We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D acoustic modeling algorithms is highly dependent on accurate pose estimation; small errors in sensor pose can lead to severe reconstruction artifacts. In this paper, we propose an algorithm that jointly optimize…
▽ More
We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D acoustic modeling algorithms is highly dependent on accurate pose estimation; small errors in sensor pose can lead to severe reconstruction artifacts. In this paper, we propose an algorithm that jointly optimizes the neural scene representation and sonar poses. Our algorithm does so by parameterizing the 6DoF poses as learnable parameters and backpropagating gradients through the neural renderer and implicit representation. We validated our algorithm on both real and simulated datasets. It produces high-fidelity 3D reconstructions even under significant pose drift.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Your Learned Constraint is Secretly a Backward Reachable Tube
Authors:
Mohamad Qadri,
Gokul Swamy,
Jonathan Francis,
Michael Kaess,
Andrea Bajcsy
Abstract:
Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that bo…
▽ More
Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints.
△ Less
Submitted 2 August, 2025; v1 submitted 26 January, 2025;
originally announced January 2025.
-
NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors
Authors:
Hung-Jui Huang,
Michael Kaess,
Wenzhen Yuan
Abstract:
Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D reconstruction. In this work, we introduce NormalFlow, a fast, robust, and real-time tactile-based 6DoF tracking algorithm. Leveraging the precise surface normal es…
▽ More
Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D reconstruction. In this work, we introduce NormalFlow, a fast, robust, and real-time tactile-based 6DoF tracking algorithm. Leveraging the precise surface normal estimation of vision-based tactile sensors, NormalFlow determines object movements by minimizing discrepancies between the tactile-derived surface normals. Our results show that NormalFlow consistently outperforms competitive baselines and can track low-texture objects like table surfaces. For long-horizon tracking, we demonstrate when rolling the sensor around a bead for 360 degrees, NormalFlow maintains a rotational tracking error of 2.5 degrees. Additionally, we present state-of-the-art tactile-based 3D reconstruction results, showcasing the high accuracy of NormalFlow. We believe NormalFlow unlocks new possibilities for high-precision perception and manipulation tasks that involve interacting with objects using hands. The video demo, code, and dataset are available on our website: https://joehjhuang.github.io/normalflow.
△ Less
Submitted 18 March, 2025; v1 submitted 12 December, 2024;
originally announced December 2024.
-
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Authors:
Carolina Higuera,
Akash Sharma,
Chaithanya Krishna Bodduluri,
Taosha Fan,
Patrick Lancaster,
Mrinal Kalakrishnan,
Michael Kaess,
Byron Boots,
Mike Lambeta,
Tingfan Wu,
Mustafa Mukadam
Abstract:
In this work, we introduce general purpose touch representations for the increasingly accessible class of vision-based tactile sensors. Such sensors have led to many recent advances in robot manipulation as they markedly complement vision, yet solutions today often rely on task and sensor specific handcrafted perception models. Collecting real data at scale with task centric ground truth labels, l…
▽ More
In this work, we introduce general purpose touch representations for the increasingly accessible class of vision-based tactile sensors. Such sensors have led to many recent advances in robot manipulation as they markedly complement vision, yet solutions today often rely on task and sensor specific handcrafted perception models. Collecting real data at scale with task centric ground truth labels, like contact forces and slip, is a challenge further compounded by sensors of various form factor differing in aspects like lighting and gel markings. To tackle this we turn to self-supervised learning (SSL) that has demonstrated remarkable performance in computer vision. We present Sparsh, a family of SSL models that can support various vision-based tactile sensors, alleviating the need for custom labels through pre-training on 460k+ tactile images with masking and self-distillation in pixel and latent spaces. We also build TacBench, to facilitate standardized benchmarking across sensors and models, comprising of six tasks ranging from comprehending tactile properties to enabling physical perception and manipulation planning. In evaluations, we find that SSL pre-training for touch representation outperforms task and sensor-specific end-to-end training by 95.1% on average over TacBench, and Sparsh (DINO) and Sparsh (IJEPA) are the most competitive, indicating the merits of learning in latent space for tactile images. Project page: https://sparsh-ssl.github.io/
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
LiPO: LiDAR Inertial Odometry for ICP Comparison
Authors:
Darwin Mick,
Taylor Pool,
Madankumar Sathenahally Nagaraju,
Michael Kaess,
Howie Choset,
Matt Travers
Abstract:
We introduce a LiDAR inertial odometry (LIO) framework, called LiPO, that enables direct comparisons of different iterative closest point (ICP) point cloud registration methods. The two common ICP methods we compare are point-to-point (P2P) and point-to-feature (P2F). In our experience, within the context of LIO, P2F-ICP results in less drift and improved mapping accuracy when robots move aggressi…
▽ More
We introduce a LiDAR inertial odometry (LIO) framework, called LiPO, that enables direct comparisons of different iterative closest point (ICP) point cloud registration methods. The two common ICP methods we compare are point-to-point (P2P) and point-to-feature (P2F). In our experience, within the context of LIO, P2F-ICP results in less drift and improved mapping accuracy when robots move aggressively through challenging environments when compared to P2P-ICP. However, P2F-ICP methods require more hand-tuned hyper-parameters that make P2F-ICP less general across all environments and motions. In real-world field robotics applications where robots are used across different environments, more general P2P-ICP methods may be preferred despite increased drift. In this paper, we seek to better quantify the trade-off between P2P-ICP and P2F-ICP to help inform when each method should be used. To explore this trade-off, we use LiPO to directly compare ICP methods and test on relevant benchmark datasets as well as on our custom unpiloted ground vehicle (UGV). We find that overall, P2F-ICP has reduced drift and improved mapping accuracy, but, P2P-ICP is more consistent across all environments and motions with minimal drift increase.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
BEVLoc: Cross-View Localization and Matching via Birds-Eye-View Synthesis
Authors:
Christopher Klammer,
Michael Kaess
Abstract:
Ground to aerial matching is a crucial and challenging task in outdoor robotics, particularly when GPS is absent or unreliable. Structures like buildings or large dense forests create interference, requiring GNSS replacements for global positioning estimates. The true difficulty lies in reconciling the perspective difference between the ground and air images for acceptable localization. Taking ins…
▽ More
Ground to aerial matching is a crucial and challenging task in outdoor robotics, particularly when GPS is absent or unreliable. Structures like buildings or large dense forests create interference, requiring GNSS replacements for global positioning estimates. The true difficulty lies in reconciling the perspective difference between the ground and air images for acceptable localization. Taking inspiration from the autonomous driving community, we propose a novel framework for synthesizing a birds-eye-view (BEV) scene representation to match and localize against an aerial map in off-road environments. We leverage contrastive learning with domain specific hard negative mining to train a network to learn similar representations between the synthesized BEV and the aerial map. During inference, BEVLoc guides the identification of the most probable locations within the aerial map through a coarse-to-fine matching strategy. Our results demonstrate promising initial outcomes in extremely difficult forest environments with limited semantic diversity. We analyze our model's performance for coarse and fine matching, assessing both the raw matching capability of our model and its performance as a GNSS replacement. Our work delves into off-road map localization while establishing a foundational baseline for future developments in localization. Our code is available at: https://github.com/rpl-cmu/bevloc
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Lunar Swirls Unveil the Origin of the Moon Magnetic Field
Authors:
Boxin Zuo,
Xiangyun Hu,
Lizhe Wang,
Yi Cai,
Mason Andrew Kass
Abstract:
The origins of the lunar magnetic anomalies and swirls have long puzzled scientists.The prevailing theory posits that an ancient lunar dynamo core field magnetized extralunar meteoritic materials, leading to the current remnant magnetic anomalies that shield against solar wind ions, thereby contributing to the formation of lunar swirls. Our research reveals that these lunar swirls are the result o…
▽ More
The origins of the lunar magnetic anomalies and swirls have long puzzled scientists.The prevailing theory posits that an ancient lunar dynamo core field magnetized extralunar meteoritic materials, leading to the current remnant magnetic anomalies that shield against solar wind ions, thereby contributing to the formation of lunar swirls. Our research reveals that these lunar swirls are the result of ancient electrical currents that traversed the Moon's surface, generating powerful magnetizing fields impacting both native lunar rocks and extralunar projectile materials. We have reconstructed 3-D distribution maps of these ancient subsurface currents and developed coupling models of magnetic and electric fields that take into account the subsurface density in the prominent lunar maria and basins. Our simulations suggest these ancient currents could have reached density up to 13 A/m2, with surface magnetizing field as strong as 469 μT. We propose that these intense electrical current discharges in the crust originate from ancient interior dynamo activity.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
iMESA: Incremental Distributed Optimization for Collaborative Simultaneous Localization and Mapping
Authors:
Daniel McGann,
Michael Kaess
Abstract:
This paper introduces a novel incremental distributed back-end algorithm for Collaborative Simultaneous Localization and Mapping (C-SLAM). For real-world deployments, robotic teams require algorithms to compute a consistent state estimate accurately, within online runtime constraints, and with potentially limited communication. Existing centralized, decentralized, and distributed approaches to sol…
▽ More
This paper introduces a novel incremental distributed back-end algorithm for Collaborative Simultaneous Localization and Mapping (C-SLAM). For real-world deployments, robotic teams require algorithms to compute a consistent state estimate accurately, within online runtime constraints, and with potentially limited communication. Existing centralized, decentralized, and distributed approaches to solving C-SLAM problems struggle to achieve all of these goals. To address this capability gap, we present Incremental Manifold Edge-based Separable ADMM (iMESA) a fully distributed C-SLAM back-end algorithm that can provide a multi-robot team with accurate state estimates in real-time with only sparse pair-wise communication between robots. Extensive evaluation on real and synthetic data demonstrates that iMESA is able to outperform comparable state-of-the-art C-SLAM back-ends.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A Slices Perspective for Incremental Nonparametric Inference in High Dimensional State Spaces
Authors:
Moshe Shienman,
Ohad Levy-Or,
Michael Kaess,
Vadim Indelman
Abstract:
We introduce an innovative method for incremental nonparametric probabilistic inference in high-dimensional state spaces. Our approach leverages \slices from high-dimensional surfaces to efficiently approximate posterior distributions of any shape. Unlike many existing graph-based methods, our \slices perspective eliminates the need for additional intermediate reconstructions, maintaining a more a…
▽ More
We introduce an innovative method for incremental nonparametric probabilistic inference in high-dimensional state spaces. Our approach leverages \slices from high-dimensional surfaces to efficiently approximate posterior distributions of any shape. Unlike many existing graph-based methods, our \slices perspective eliminates the need for additional intermediate reconstructions, maintaining a more accurate representation of posterior distributions. Additionally, we propose a novel heuristic to balance between accuracy and efficiency, enabling real-time operation in nonparametric scenarios. In empirical evaluations on synthetic and real-world datasets, our \slices approach consistently outperforms other state-of-the-art methods. It demonstrates superior accuracy and achieves a significant reduction in computational complexity, often by an order of magnitude.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment
Authors:
Lihong Jin,
Wei Dong,
Wenshan Wang,
Michael Kaess
Abstract:
We introduce BEVRender, a novel learning based approach for the localization of ground vehicles in Global Navigation Satellite System(GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bi…
▽ More
We introduce BEVRender, a novel learning based approach for the localization of ground vehicles in Global Navigation Satellite System(GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local bird's-eye-view(BEV) images of the local terrain. Subsequently, these images are aligned with a geo referenced aerial map through template matching to achieve accurate cross-view registration. Our approach overcomes the inherent limitations of visual inertial odometry systems and the substantial storage requirements of image-retrieval localization strategies, which are susceptible to drift and scalability issues, respectively. Extensive experimentation validates BEVRender's advancement over existing GNSS-denied visual localization methods, demonstrating notable enhancements in both localization accuracy and update frequency.
△ Less
Submitted 10 December, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion
Authors:
Ziyuan Qu,
Omkar Vengurlekar,
Mohamad Qadri,
Kevin Zhang,
Michael Kaess,
Christopher Metzler,
Suren Jayasuriya,
Adithya Pediredla
Abstract:
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfo…
▽ More
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view ($360^{\circ}$ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).
△ Less
Submitted 5 July, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion
Authors:
Mohamad Qadri,
Kevin Zhang,
Akshay Hinduja,
Michael Kaess,
Adithya Pediredla,
Christopher A. Metzler
Abstract:
Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the…
▽ More
Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. A website visualizing the results of our paper is located at this address: https://aoneus.github.io/
△ Less
Submitted 2 August, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation
Authors:
Sudharshan Suresh,
Haozhi Qi,
Tingfan Wu,
Taosha Fan,
Luis Pineda,
Mike Lambeta,
Jitendra Malik,
Mrinal Kalakrishnan,
Roberto Calandra,
Michael Kaess,
Joseph Ortiz,
Mustafa Mukadam
Abstract:
To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects…
▽ More
To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of $81$% and average pose drifts of $4.7\,\text{mm}$, further reduced to $2.3\,\text{mm}$ with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to $94$% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Multi-Radar Inertial Odometry for 3D State Estimation using mmWave Imaging Radar
Authors:
Jui-Te Huang,
Ruoyang Xu,
Akshay Hinduja,
Michael Kaess
Abstract:
State estimation is a crucial component for the successful implementation of robotic systems, relying on sensors such as cameras, LiDAR, and IMUs. However, in real-world scenarios, the performance of these sensors is degraded by challenging environments, e.g. adverse weather conditions and low-light scenarios. The emerging 4D imaging radar technology is capable of providing robust perception in ad…
▽ More
State estimation is a crucial component for the successful implementation of robotic systems, relying on sensors such as cameras, LiDAR, and IMUs. However, in real-world scenarios, the performance of these sensors is degraded by challenging environments, e.g. adverse weather conditions and low-light scenarios. The emerging 4D imaging radar technology is capable of providing robust perception in adverse conditions. Despite its potential, challenges remain for indoor settings where noisy radar data does not present clear geometric features. Moreover, disparities in radar data resolution and field of view (FOV) can lead to inaccurate measurements. While prior research has explored radar-inertial odometry based on Doppler velocity information, challenges remain for the estimation of 3D motion because of the discrepancy in the FOV and resolution of the radar sensor. In this paper, we address Doppler velocity measurement uncertainties. We present a method to optimize body frame velocity while managing Doppler velocity uncertainty. Based on our observations, we propose a dual imaging radar configuration to mitigate the challenge of discrepancy in radar data. To attain high-precision 3D state estimation, we introduce a strategy that seamlessly integrates radar data with a consumer-grade IMU sensor using fixed-lag smoothing optimization. Finally, we evaluate our approach using real-world 3D motion data.
△ Less
Submitted 14 March, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars
Authors:
Samiran Gode,
Akshay Hinduja,
Michael Kaess
Abstract:
In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems fro…
▽ More
In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.
△ Less
Submitted 13 May, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Asynchronous Distributed Smoothing and Mapping via On-Manifold Consensus ADMM
Authors:
Daniel McGann,
Kyle Lassak,
Michael Kaess
Abstract:
In this paper we present a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM). Multi-robot teams require that agents have timely and accurate solutions to their state as well as the states of the other robots in the team. To optimize this solution we develop a CSLAM back-end based on Consensus ADMM called MESA (Ma…
▽ More
In this paper we present a fully distributed, asynchronous, and general purpose optimization algorithm for Consensus Simultaneous Localization and Mapping (CSLAM). Multi-robot teams require that agents have timely and accurate solutions to their state as well as the states of the other robots in the team. To optimize this solution we develop a CSLAM back-end based on Consensus ADMM called MESA (Manifold, Edge-based, Separable ADMM). MESA is fully distributed to tolerate failures of individual robots, asynchronous to tolerate communication delays and outages, and general purpose to handle any CSLAM problem formulation. We demonstrate that MESA exhibits superior convergence rates and accuracy compare to existing state-of-the art CSLAM back-end optimizers.
△ Less
Submitted 19 March, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Learning Covariances for Estimation with Constrained Bilevel Optimization
Authors:
Mohamad Qadri,
Zachary Manchester,
Michael Kaess
Abstract:
We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and…
▽ More
We consider the problem of learning error covariance matrices for robotic state estimation. The convergence of a state estimator to the correct belief over the robot state is dependent on the proper tuning of noise models. During inference, these models are used to weigh different blocks of the Jacobian and error vector resulting from linearization and hence, additionally affect the stability and convergence of the non-linear system. We propose a gradient-based method to estimate well-conditioned covariance matrices by formulating the learning process as a constrained bilevel optimization problem over factor graphs. We evaluate our method against baselines across a range of simulated and real-world tasks and demonstrate that our technique converges to model estimates that lead to better solutions as evidenced by the improved tracking accuracy on unseen test trajectories.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Learning Observation Models with Incremental Non-Differentiable Graph Optimizers in the Loop for Robotics State Estimation
Authors:
Mohamad Qadri,
Michael Kaess
Abstract:
We consider the problem of learning observation models for robot state estimation with incremental non-differentiable optimizers in the loop. Convergence to the correct belief over the robot state is heavily dependent on a proper tuning of observation models which serve as input to the optimizer. We propose a gradient-based learning method which converges much quicker to model estimates that lead…
▽ More
We consider the problem of learning observation models for robot state estimation with incremental non-differentiable optimizers in the loop. Convergence to the correct belief over the robot state is heavily dependent on a proper tuning of observation models which serve as input to the optimizer. We propose a gradient-based learning method which converges much quicker to model estimates that lead to solutions of much better quality compared to an existing state-of-the-art method as measured by the tracking accuracy over unseen robot test trajectories.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Group-$k$ consistent measurement set maximization via maximum clique over k-Uniform hypergraphs for robust multi-robot map merging
Authors:
Brendon Forsgren,
Ram Vasudevan,
Michael Kaess,
Timothy W. McLain,
Joshua G. Mangelson
Abstract:
This paper unifies the theory of consistent-set maximization for robust outlier detection in a simultaneous localization and mapping framework. We first describe the notion of pairwise consistency before discussing how a consistency graph can be formed by evaluating pairs of measurements for consistency. Finding the largest set of consistent measurements is transformed into an instance of the maxi…
▽ More
This paper unifies the theory of consistent-set maximization for robust outlier detection in a simultaneous localization and mapping framework. We first describe the notion of pairwise consistency before discussing how a consistency graph can be formed by evaluating pairs of measurements for consistency. Finding the largest set of consistent measurements is transformed into an instance of the maximum clique problem and can be solved relatively quickly using existing maximum-clique solvers. We then generalize our algorithm to check consistency on a group-$k$ basis by using a generalized notion of consistency and using generalized graphs. We also present modified maximum clique algorithms that function on generalized graphs to find the set of measurements that is internally group-$k$ consistent. We address the exponential nature of group-$k$ consistency and present methods that can substantially decrease the number of necessary checks performed when evaluating consistency. We extend our prior work to multi-agent systems in both simulation and hardware and provide a comparison with other state-of-the-art methods.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
MidasTouch: Monte-Carlo inference over distributions across sliding touch
Authors:
Sudharshan Suresh,
Zilin Si,
Stuart Anderson,
Michael Kaess,
Mustafa Mukadam
Abstract:
We present MidasTouch, a tactile perception system for online global localization of a vision-based touch sensor sliding on an object surface. This framework takes in posed tactile images over time, and outputs an evolving distribution of sensor pose on the object's surface, without the need for visual priors. Our key insight is to estimate local surface geometry with tactile sensing, learn a comp…
▽ More
We present MidasTouch, a tactile perception system for online global localization of a vision-based touch sensor sliding on an object surface. This framework takes in posed tactile images over time, and outputs an evolving distribution of sensor pose on the object's surface, without the need for visual priors. Our key insight is to estimate local surface geometry with tactile sensing, learn a compact representation for it, and disambiguate these signals over a long time horizon. The backbone of MidasTouch is a Monte-Carlo particle filter, with a measurement model based on a tactile code network learned from tactile simulation. This network, inspired by LIDAR place recognition, compactly summarizes local surface geometries. These generated codes are efficiently compared against a precomputed tactile codebook per-object, to update the pose distribution. We further release the YCB-Slide dataset of real-world and simulated forceful sliding interactions between a vision-based tactile sensor and standard YCB objects. While single-touch localization can be inherently ambiguous, we can quickly localize our sensor by traversing salient surface geometries. Project page: https://suddhu.github.io/midastouch-tactile/
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags
Authors:
Bardienus P Duisterhof,
Yaoyu Hu,
Si Heng Teng,
Michael Kaess,
Sebastian Scherer
Abstract:
Wide-angle cameras are uniquely positioned for mobile robots, by virtue of the rich information they provide in a small, light, and cost-effective form factor. An accurate calibration of the intrinsics and extrinsics is a critical pre-requisite for using the edge of a wide-angle lens for depth perception and odometry. Calibrating wide-angle lenses with current state-of-the-art techniques yields po…
▽ More
Wide-angle cameras are uniquely positioned for mobile robots, by virtue of the rich information they provide in a small, light, and cost-effective form factor. An accurate calibration of the intrinsics and extrinsics is a critical pre-requisite for using the edge of a wide-angle lens for depth perception and odometry. Calibrating wide-angle lenses with current state-of-the-art techniques yields poor results due to extreme distortion at the edge, as most algorithms assume a lens with low to medium distortion closer to a pinhole projection. In this work we present our methodology for accurate wide-angle calibration. Our pipeline generates an intermediate model, and leverages it to iteratively improve feature detection and eventually the camera parameters. We test three key methods to utilize intermediate camera models: (1) undistorting the image into virtual pinhole cameras, (2) reprojecting the target into the image frame, and (3) adaptive subpixel refinement. Combining adaptive subpixel refinement and feature reprojection significantly improves reprojection errors by up to 26.59 %, helps us detect up to 42.01 % more features, and improves performance in the downstream task of dense depth mapping. Finally, TartanCalib is open-source and implemented into an easy-to-use calibration toolbox. We also provide a translation layer with other state-of-the-art works, which allows for regressing generic models with thousands of parameters or using a more robust solver. To this end, TartanCalib is the tool of choice for wide-angle calibration. Project website and code: http://tartancalib.com.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Acoustic Localization and Communication Using a MEMS Microphone for Low-cost and Low-power Bio-inspired Underwater Robots
Authors:
Akshay Hinduja,
Yunsik Ohm,
Jiahe Liao,
Carmel Majidi,
Michael Kaess
Abstract:
Having accurate localization capabilities is one of the fundamental requirements of autonomous robots. For underwater vehicles, the choices for effective localization are limited due to limitations of GPS use in water and poor environmental visibility that makes camera-based methods ineffective. Popular inertial navigation methods for underwater localization using Doppler-velocity log sensors, son…
▽ More
Having accurate localization capabilities is one of the fundamental requirements of autonomous robots. For underwater vehicles, the choices for effective localization are limited due to limitations of GPS use in water and poor environmental visibility that makes camera-based methods ineffective. Popular inertial navigation methods for underwater localization using Doppler-velocity log sensors, sonar, high-end inertial navigation systems, or acoustic positioning systems require bulky expensive hardware which are incompatible with low cost, bio-inspired underwater robots. In this paper, we introduce an approach for underwater robot localization inspired by GPS methods known as acoustic pseudoranging. Our method allows us to potentially localize multiple bio-inspired robots equipped with commonly available micro electro-mechanical systems microphones. This is achieved through estimating the time difference of arrival of acoustic signals sent simultaneously through four speakers with a known constellation geometry. We also leverage the same acoustic framework to perform oneway communication with the robot to execute some primitive motions. To our knowledge, this is the first application of the approach for the on-board localization of small bio-inspired robots in water. Hardware schematics and the accompanying code are released to aid further development in the field3.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Robust Incremental Smoothing and Mapping (riSAM)
Authors:
Daniel McGann,
John G. Rogers III,
Michael Kaess
Abstract:
This paper presents a method for robust optimization for online incremental Simultaneous Localization and Mapping (SLAM). Due to the NP-Hardness of data association in the presence of perceptual aliasing, tractable (approximate) approaches to data association will produce erroneous measurements. We require SLAM back-ends that can converge to accurate solutions in the presence of outlier measuremen…
▽ More
This paper presents a method for robust optimization for online incremental Simultaneous Localization and Mapping (SLAM). Due to the NP-Hardness of data association in the presence of perceptual aliasing, tractable (approximate) approaches to data association will produce erroneous measurements. We require SLAM back-ends that can converge to accurate solutions in the presence of outlier measurements while meeting online efficiency constraints. Existing robust SLAM methods either remain sensitive to outliers, become increasingly sensitive to initialization, or fail to provide online efficiency. We present the robust incremental Smoothing and Mapping (riSAM) algorithm, a robust back-end optimizer for incremental SLAM based on Graduated Non-Convexity. We demonstrate on benchmarking datasets that our algorithm achieves online efficiency, outperforms existing online approaches, and matches or improves the performance of existing offline methods.
△ Less
Submitted 27 April, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Conditional GANs for Sonar Image Filtering with Applications to Underwater Occupancy Mapping
Authors:
Tianxiang Lin,
Akshay Hinduja,
Mohamad Qadri,
Michael Kaess
Abstract:
Underwater robots typically rely on acoustic sensors like sonar to perceive their surroundings. However, these sensors are often inundated with multiple sources and types of noise, which makes using raw data for any meaningful inference with features, objects, or boundary returns very difficult. While several conventional methods of dealing with noise exist, their success rates are unsatisfactory.…
▽ More
Underwater robots typically rely on acoustic sensors like sonar to perceive their surroundings. However, these sensors are often inundated with multiple sources and types of noise, which makes using raw data for any meaningful inference with features, objects, or boundary returns very difficult. While several conventional methods of dealing with noise exist, their success rates are unsatisfactory. This paper presents a novel application of conditional Generative Adversarial Networks (cGANs) to train a model to produce noise-free sonar images, outperforming several conventional filtering methods. Estimating free space is crucial for autonomous robots performing active exploration and mapping. Thus, we apply our approach to the task of underwater occupancy mapping and show superior free and occupied space inference when compared to conventional methods.
△ Less
Submitted 9 July, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Neural Implicit Surface Reconstruction using Imaging Sonar
Authors:
Mohamad Qadri,
Michael Kaess,
Ioannis Gkioulekas
Abstract:
We present a technique for dense 3D reconstruction of objects using an imaging sonar, also known as forward-looking sonar (FLS). Compared to previous methods that model the scene geometry as point clouds or volumetric grids, we represent the geometry as a neural implicit function. Additionally, given such a representation, we use a differentiable volumetric renderer that models the propagation of…
▽ More
We present a technique for dense 3D reconstruction of objects using an imaging sonar, also known as forward-looking sonar (FLS). Compared to previous methods that model the scene geometry as point clouds or volumetric grids, we represent the geometry as a neural implicit function. Additionally, given such a representation, we use a differentiable volumetric renderer that models the propagation of acoustic waves to synthesize imaging sonar measurements. We perform experiments on real and synthetic datasets and show that our algorithm reconstructs high-fidelity surface geometry from multi-view FLS images at much higher quality than was possible with previous techniques and without suffering from their associated memory overhead.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Group-$k$ Consistent Measurement Set Maximization for Robust Outlier Detection
Authors:
Brendon Forsgren,
Ram Vasudevan,
Michael Kaess,
Timothy W. McLain,
Joshua G. Mangelson
Abstract:
This paper presents a method for the robust selection of measurements in a simultaneous localization and mapping (SLAM) framework. Existing methods check consistency or compatibility on a pairwise basis, however many measurement types are not sufficiently constrained in a pairwise scenario to determine if either measurement is inconsistent with the other. This paper presents group-$k$ consistency…
▽ More
This paper presents a method for the robust selection of measurements in a simultaneous localization and mapping (SLAM) framework. Existing methods check consistency or compatibility on a pairwise basis, however many measurement types are not sufficiently constrained in a pairwise scenario to determine if either measurement is inconsistent with the other. This paper presents group-$k$ consistency maximization (G$k$CM) that estimates the largest set of measurements that is internally group-$k$ consistent. Solving for the largest set of group-$k$ consistent measurements can be formulated as an instance of the maximum clique problem on generalized graphs and can be solved by adapting current methods. This paper evaluates the performance of G$k$CM using simulated data and compares it to pairwise consistency maximization (PCM) presented in previous work.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Long-term Visual Map Sparsification with Heterogeneous GNN
Authors:
Ming-Fang Chang,
Yipu Zhao,
Rajvi Shah,
Jakob J. Engel,
Michael Kaess,
Simon Lucey
Abstract:
We address the problem of map sparsification for long-term visual localization. For map sparsification, a commonly employed assumption is that the pre-build map and the later captured localization query are consistent. However, this assumption can be easily violated in the dynamic world. Additionally, the map size grows as new data accumulate through time, causing large data overhead in the long t…
▽ More
We address the problem of map sparsification for long-term visual localization. For map sparsification, a commonly employed assumption is that the pre-build map and the later captured localization query are consistent. However, this assumption can be easily violated in the dynamic world. Additionally, the map size grows as new data accumulate through time, causing large data overhead in the long term. In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization. Inspired by the recent progress in Graph Neural Network(GNN), we propose the first work that models SfM maps as heterogeneous graphs and predicts 3D point importance scores with a GNN, which enables us to directly exploit the rich information in the SfM map graph. Two novel supervisions are proposed: 1) a data-fitting term for selecting valuable points to future localization based on training queries; 2) a K-Cover term for selecting sparse points with full map coverage. The experiments show that our method selected map points on stable and widely visible structures and outperformed baselines in localization performance.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Revisiting LiDAR Registration and Reconstruction: A Range Image Perspective
Authors:
Wei Dong,
Kwonyoung Ryu,
Michael Kaess,
Jaesik Park
Abstract:
Spinning LiDAR data are prevalent for 3D vision tasks. Since LiDAR data is presented in the form of point clouds, expensive 3D operations are usually required. This paper revisits spinning LiDAR scan formation and presents a cylindrical range image representation with a ray-wise projection/unprojection model. It is built upon raw scans and supports lossless conversion from 2D to 3D, allowing fast…
▽ More
Spinning LiDAR data are prevalent for 3D vision tasks. Since LiDAR data is presented in the form of point clouds, expensive 3D operations are usually required. This paper revisits spinning LiDAR scan formation and presents a cylindrical range image representation with a ray-wise projection/unprojection model. It is built upon raw scans and supports lossless conversion from 2D to 3D, allowing fast 2D operations, including 2D index-based neighbor search and downsampling. We then propose, to the best of our knowledge, the first multi-scale registration and dense signed distance function (SDF) reconstruction system for LiDAR range images. We further collect a dataset of indoor and outdoor LiDAR scenes in the posed range image format. A comprehensive evaluation of registration and reconstruction is conducted on the proposed dataset and the KITTI dataset. Experiments demonstrate that our approach outperforms surface reconstruction baselines and achieves similar performance to state-of-the-art LiDAR registration methods, including a modern learning-based registration approach. Thanks to the simplicity, our registration runs at 100Hz and SDF reconstruction in real time. The dataset and a modularized C++/Python toolbox will be released.
△ Less
Submitted 28 March, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
PatchGraph: In-hand tactile tracking with learned surface normals
Authors:
Paloma Sodhi,
Michael Kaess,
Mustafa Mukadam,
Stuart Anderson
Abstract:
We address the problem of tracking 3D object poses from touch during in-hand manipulations. Specifically, we look at tracking small objects using vision-based tactile sensors that provide high-dimensional tactile image measurements at the point of contact. While prior work has relied on a-priori information about the object being localized, we remove this requirement. Our key insight is that an ob…
▽ More
We address the problem of tracking 3D object poses from touch during in-hand manipulations. Specifically, we look at tracking small objects using vision-based tactile sensors that provide high-dimensional tactile image measurements at the point of contact. While prior work has relied on a-priori information about the object being localized, we remove this requirement. Our key insight is that an object is composed of several local surface patches, each informative enough to achieve reliable object tracking. Moreover, we can recover the geometry of this local patch online by extracting local surface normal information embedded in each tactile image. We propose a novel two-stage approach. First, we learn a mapping from tactile images to surface normals using an image translation network. Second, we use these surface normals within a factor graph to both reconstruct a local patch map and use it to infer 3D object poses. We demonstrate reliable object tracking for over $100$ contact sequences across unique shapes with four objects in simulation and two objects in the real-world. Supplementary video: https://youtu.be/FHks--haOGY
△ Less
Submitted 11 April, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception
Authors:
Wei Dong,
Yixing Lao,
Michael Kaess,
Vladlen Koltun
Abstract:
We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike exist…
▽ More
We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike existing GPU hash maps, the ASH framework provides a versatile tensor interface, hiding low-level details from the users. In addition, by decoupling the internal hashing data structures and key-value data in buffers, we offer direct access to spatially varying data via indices, enabling seamless integration to modern libraries such as PyTorch. To achieve this, we 1) detach stored key-value data from the low-level hash map implementation; 2) bridge the pointer-first low level data structures to index-first high-level tensor interfaces via an index heap; 3) adapt both generic and non-generic integer-only hash map implementations as backends to operate on multi-dimensional keys. We first profile our hash map against state-of-the-art hash maps on synthetic data to show the performance gain from this architecture. We then show that ASH can consistently achieve higher performance on various large-scale 3D perception tasks with fewer LoC by showcasing several applications, including 1) point cloud voxelization, 2) retargetable volumetric scene reconstruction, 3) non-rigid point cloud registration and volumetric deformation, and 4) spatially varying geometry and appearance refinement. ASH and its example applications are open sourced in Open3D (http://www.open3d.org).
△ Less
Submitted 29 January, 2023; v1 submitted 1 October, 2021;
originally announced October 2021.
-
ShapeMap 3-D: Efficient shape mapping through dense touch and vision
Authors:
Sudharshan Suresh,
Zilin Si,
Joshua G. Mangelson,
Wenzhen Yuan,
Michael Kaess
Abstract:
Knowledge of 3-D object shape is of great importance to robot manipulation tasks, but may not be readily available in unstructured environments. While vision is often occluded during robot-object interaction, high-resolution tactile sensors can give a dense local perspective of the object. However, tactile sensors have limited sensing area and the shape representation must faithfully approximate n…
▽ More
Knowledge of 3-D object shape is of great importance to robot manipulation tasks, but may not be readily available in unstructured environments. While vision is often occluded during robot-object interaction, high-resolution tactile sensors can give a dense local perspective of the object. However, tactile sensors have limited sensing area and the shape representation must faithfully approximate non-contact areas. In addition, a key challenge is efficiently incorporating these dense tactile measurements into a 3-D mapping framework. In this work, we propose an incremental shape mapping method using a GelSight tactile sensor and a depth camera. Local shape is recovered from tactile images via a learned model trained in simulation. Through efficient inference on a spatial factor graph informed by a Gaussian process, we build an implicit surface representation of the object. We demonstrate visuo-tactile mapping in both simulated and real-world experiments, to incrementally build 3-D reconstructions of household objects.
△ Less
Submitted 10 March, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
LEO: Learning Energy-based Models in Factor Graph Optimization
Authors:
Paloma Sodhi,
Eric Dexheimer,
Mustafa Mukadam,
Stuart Anderson,
Michael Kaess
Abstract:
We address the problem of learning observation models end-to-end for estimation. Robots operating in partially observable environments must infer latent states from multiple sensory inputs using observation models that capture the joint distribution between latent states and observations. This inference problem can be formulated as an objective over a graph that optimizes for the most likely seque…
▽ More
We address the problem of learning observation models end-to-end for estimation. Robots operating in partially observable environments must infer latent states from multiple sensory inputs using observation models that capture the joint distribution between latent states and observations. This inference problem can be formulated as an objective over a graph that optimizes for the most likely sequence of states using all previous measurements. Prior work uses observation models that are either known a-priori or trained on surrogate losses independent of the graph optimizer. In this paper, we propose a method to directly optimize end-to-end tracking performance by learning observation models with the graph optimizer in the loop. This direct approach may appear, however, to require the inference algorithm to be fully differentiable, which many state-of-the-art graph optimizers are not. Our key insight is to instead formulate the problem as that of energy-based learning. We propose a novel approach, LEO, for learning observation models end-to-end with graph optimizers that may be non-differentiable. LEO alternates between sampling trajectories from the graph posterior and updating the model to match these samples to ground truth trajectories. We propose a way to generate such samples efficiently using incremental Gauss-Newton solvers. We compare LEO against baselines on datasets drawn from two distinct tasks: navigation and real-world planar pushing. We show that LEO is able to learn complex observation models with lower errors and fewer samples. Supplementary video: https://youtu.be/YqzlUPudfkA
△ Less
Submitted 8 April, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
CMU-GPR Dataset: Ground Penetrating Radar Dataset for Robot Localization and Mapping
Authors:
Alexander Baikovitz,
Paloma Sodhi,
Michael Dille,
Michael Kaess
Abstract:
There has been exciting recent progress in using radar as a sensor for robot navigation due to its increased robustness to varying environmental conditions. However, within these different radar perception systems, ground penetrating radar (GPR) remains under-explored. By measuring structures beneath the ground, GPR can provide stable features that are less variant to ambient weather, scene, and l…
▽ More
There has been exciting recent progress in using radar as a sensor for robot navigation due to its increased robustness to varying environmental conditions. However, within these different radar perception systems, ground penetrating radar (GPR) remains under-explored. By measuring structures beneath the ground, GPR can provide stable features that are less variant to ambient weather, scene, and lighting changes, making it a compelling choice for long-term spatio-temporal mapping. In this work, we present the CMU-GPR dataset--an open-source ground penetrating radar dataset for research in subsurface-aided perception for robot navigation. In total, the dataset contains 15 distinct trajectory sequences in 3 GPS-denied, indoor environments. Measurements from a GPR, wheel encoder, RGB camera, and inertial measurement unit were collected with ground truth positions from a robotic total station. In addition to the dataset, we also provide utility code to convert raw GPR data into processed images. This paper describes our recording platform, the data format, utility scripts, and proposed methods for using this data.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Ground Encoding: Learned Factor Graph-based Models for Localizing Ground Penetrating Radar
Authors:
Alexander Baikovitz,
Paloma Sodhi,
Michael Dille,
Michael Kaess
Abstract:
We address the problem of robot localization using ground penetrating radar (GPR) sensors. Current approaches for localization with GPR sensors require a priori maps of the system's environment as well as access to approximate global positioning (GPS) during operation. In this paper, we propose a novel, real-time GPR-based localization system for unknown and GPS-denied environments. We model the l…
▽ More
We address the problem of robot localization using ground penetrating radar (GPR) sensors. Current approaches for localization with GPR sensors require a priori maps of the system's environment as well as access to approximate global positioning (GPS) during operation. In this paper, we propose a novel, real-time GPR-based localization system for unknown and GPS-denied environments. We model the localization problem as an inference over a factor graph. Our approach combines 1D single-channel GPR measurements to form 2D image submaps. To use these GPR images in the graph, we need sensor models that can map noisy, high-dimensional image measurements into the state space. These are challenging to obtain a priori since image generation has a complex dependency on subsurface composition and radar physics, which itself varies with sensors and variations in subsurface electromagnetic properties. Our key idea is to instead learn relative sensor models directly from GPR data that map non-sequential GPR image pairs to relative robot motion. These models are incorporated as factors within the factor graph with relative motion predictions correcting for accumulated drift in the position estimates. We demonstrate our approach over datasets collected across multiple locations using a custom designed experimental rig. We show reliable, real-time localization using only GPR and odometry measurements for varying trajectories in three distinct GPS-denied environments. For our supplementary video, see https://youtu.be/HXXgdTJzqyw.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Learning Tactile Models for Factor Graph-based Estimation
Authors:
Paloma Sodhi,
Michael Kaess,
Mustafa Mukadam,
Stuart Anderson
Abstract:
We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are neede…
▽ More
We're interested in the problem of estimating object states from touch during manipulation under occlusions. In this work, we address the problem of estimating object poses from touch during planar pushing. Vision-based tactile sensors provide rich, local image measurements at the point of contact. A single such measurement, however, contains limited information and multiple measurements are needed to infer latent object state. We solve this inference problem using a factor graph. In order to incorporate tactile measurements in the graph, we need local observation models that can map high-dimensional tactile images onto a low-dimensional state space. Prior work has used low-dimensional force measurements or engineered functions to interpret tactile measurements. These methods, however, can be brittle and difficult to scale across objects and sensors. Our key insight is to directly learn tactile observation models that predict the relative pose of the sensor given a pair of tactile images. These relative poses can then be incorporated as factors within a factor graph. We propose a two-stage approach: first we learn local tactile observation models supervised with ground truth data, and then integrate these models along with physics and geometric factors within a factor graph optimizer. We demonstrate reliable object tracking using only tactile feedback for 150 real-world planar pushing sequences with varying trajectories across three object shapes. Supplementary video: https://youtu.be/y1kBfSmi8w0
△ Less
Submitted 28 March, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
Tactile SLAM: Real-time inference of shape and pose from planar pushing
Authors:
Sudharshan Suresh,
Maria Bauza,
Kuan-Ting Yu,
Joshua G. Mangelson,
Alberto Rodriguez,
Michael Kaess
Abstract:
Tactile perception is central to robot manipulation in unstructured environments. However, it requires contact, and a mature implementation must infer object models while also accounting for the motion induced by the interaction. In this work, we present a method to estimate both object shape and pose in real-time from a stream of tactile measurements. This is applied towards tactile exploration o…
▽ More
Tactile perception is central to robot manipulation in unstructured environments. However, it requires contact, and a mature implementation must infer object models while also accounting for the motion induced by the interaction. In this work, we present a method to estimate both object shape and pose in real-time from a stream of tactile measurements. This is applied towards tactile exploration of an unknown object by planar pushing. We consider this as an online SLAM problem with a nonparametric shape representation. Our formulation of tactile inference alternates between Gaussian process implicit surface regression and pose estimation on a factor graph. Through a combination of local Gaussian processes and fixed-lag smoothing, we infer object shape and pose in real-time. We evaluate our system across different objects in both simulated and real-world planar pushing tasks.
△ Less
Submitted 26 March, 2021; v1 submitted 13 November, 2020;
originally announced November 2020.
-
Compositional Scalable Object SLAM
Authors:
Akash Sharma,
Wei Dong,
Michael Kaess
Abstract:
We present a fast, scalable, and accurate Simultaneous Localization and Mapping (SLAM) system that represents indoor scenes as a graph of objects. Leveraging the observation that artificial environments are structured and occupied by recognizable objects, we show that a compositional scalable object mapping formulation is amenable to a robust SLAM solution for drift-free large scale indoor reconst…
▽ More
We present a fast, scalable, and accurate Simultaneous Localization and Mapping (SLAM) system that represents indoor scenes as a graph of objects. Leveraging the observation that artificial environments are structured and occupied by recognizable objects, we show that a compositional scalable object mapping formulation is amenable to a robust SLAM solution for drift-free large scale indoor reconstruction. To achieve this, we propose a novel semantically assisted data association strategy that obtains unambiguous persistent object landmarks, and a 2.5D compositional rendering method that enables reliable frame-to-model RGB-D tracking. Consequently, we deliver an optimized online implementation that can run at near frame rate with a single graphics card, and provide a comprehensive evaluation against state of the art baselines. An open source implementation will be provided at https://placeholder.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
An Efficient Planar Bundle Adjustment Algorithm
Authors:
Lipu Zhou,
Daniel Koppel,
Hui Ju,
Frank Steinbruecker,
Michael Kaess
Abstract:
This paper presents an efficient algorithm for the least-squares problem using the point-to-plane cost, which aims to jointly optimize depth sensor poses and plane parameters for 3D reconstruction. We call this least-squares problem \textbf{Planar Bundle Adjustment} (PBA), due to the similarity between this problem and the original Bundle Adjustment (BA) in visual reconstruction. As planes ubiquit…
▽ More
This paper presents an efficient algorithm for the least-squares problem using the point-to-plane cost, which aims to jointly optimize depth sensor poses and plane parameters for 3D reconstruction. We call this least-squares problem \textbf{Planar Bundle Adjustment} (PBA), due to the similarity between this problem and the original Bundle Adjustment (BA) in visual reconstruction. As planes ubiquitously exist in the man-made environment, they are generally used as landmarks in SLAM algorithms for various depth sensors. PBA is important to reduce drift and improve the quality of the map. However, directly adopting the well-established BA framework in visual reconstruction will result in a very inefficient solution for PBA. This is because a 3D point only has one observation at a camera pose. In contrast, a depth sensor can record hundreds of points in a plane at a time, which results in a very large nonlinear least-squares problem even for a small-scale space. Fortunately, we find that there exist a special structure of the PBA problem. We introduce a reduced Jacobian matrix and a reduced residual vector, and prove that they can replace the original Jacobian matrix and residual vector in the generally adopted Levenberg-Marquardt (LM) algorithm. This significantly reduces the computational cost. Besides, when planes are combined with other features for 3D reconstruction, the reduced Jacobian matrix and residual vector can also replace the corresponding parts derived from planes. Our experimental results verify that our algorithm can significantly reduce the computational time compared to the solution using the traditional BA framework. Besides, our algorithm is faster, more accuracy, and more robust to initialization errors compared to the start-of-the-art solution using the plane-to-plane cost
△ Less
Submitted 16 August, 2020; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Impact of gravity waves on the middle atmosphere of Mars: a non-orographic gravity wave parameterization based on Global Climate modeling and MCS observations
Authors:
G. Gilli,
F. Forget,
A. Spiga,
T. Navarro,
E. Millour,
L. Montabone,
A. Kleinböhl,
D. M. Kass,
D. J. McCleese,
J. T. Schofield
Abstract:
The impact of gravity waves (GW) on diurnal tides and the global circulation in the middle/upper atmosphere of Mars is investigated using a General Circulation Model (GCM). We have implemented a stochastic parameterization of non-orographic GW into the Laboratoire de Météorologie Dynamique (LMD) Mars GCM (LMD-MGCM) following an innovative approach. The source is assumed to be located above typical…
▽ More
The impact of gravity waves (GW) on diurnal tides and the global circulation in the middle/upper atmosphere of Mars is investigated using a General Circulation Model (GCM). We have implemented a stochastic parameterization of non-orographic GW into the Laboratoire de Météorologie Dynamique (LMD) Mars GCM (LMD-MGCM) following an innovative approach. The source is assumed to be located above typical convective cells ($\sim$ 250 Pa) and the effect of GW on the circulation and predicted thermal structure above 1 Pa ($\sim$ 50 km) is analyzed. We focus on the comparison between model simulations and observations by the Mars Climate Sounder (MCS) on board Mars Reconnaissance Orbiter during Martian Year 29. MCS data provide the only systematic measurements of the Martian mesosphere up to 80 km to date. The primary effect of GW is to damp the thermal tides by reducing the diurnal oscillation of the meridional and zonal winds. The GW drag reaches magnitudes of the order of 1 m/s/sol above 10$^{-2}$ Pa in the northern hemisphere winter solstice and produces major changes in the zonal wind field (from tens to hundreds of m/s), while the impact on the temperature field is relatively moderate (10-20K). It suggests that GW induced alteration of the meridional flow is the main responsible for the simulated temperature variation. The results also show that with the GW scheme included, the maximum day-night temperature difference due to the diurnal tide is around 10K, and the peak of the tide is shifted toward lower altitudes, in better agreement with MCS observations.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Martian Year 34 Column Dust Climatology from Mars Climate Sounder Observations: Reconstructed Maps and Model Simulations
Authors:
Luca Montabone,
Aymeric Spiga,
David M. Kass,
Armin Kleinböhl,
François Forget,
Ehouarn Millour
Abstract:
We have reconstructed longitude-latitude maps of column dust optical depth (CDOD) for Martian year (MY) 34 (May 5, 2017 --- March 23, 2019) using observations by the Mars Climate Sounder (MCS) aboard NASA's Mars Reconnaissance Orbiter spacecraft. Our methodology works by gridding standard and newly available estimates of CDOD from MCS limb observations, using the "iterative weighted binning" metho…
▽ More
We have reconstructed longitude-latitude maps of column dust optical depth (CDOD) for Martian year (MY) 34 (May 5, 2017 --- March 23, 2019) using observations by the Mars Climate Sounder (MCS) aboard NASA's Mars Reconnaissance Orbiter spacecraft. Our methodology works by gridding standard and newly available estimates of CDOD from MCS limb observations, using the "iterative weighted binning" methodology. In this work, we reconstruct four gridded CDOD maps per sol, at different Mars Universal Times. Together with the seasonal and day-to-day variability, the use of several maps per sol allows to explore also the daily variability of CDOD in the MCS dataset, which is shown to be particularly strong during the MY 34 equinoctial Global Dust Event (GDE). Regular maps of CDOD are then produced by daily averaging and spatially interpolating the irregularly gridded maps using a standard "kriging" interpolator, and can be used as "dust scenario" for numerical model simulations. In order to understand whether the daily variability of CDOD has a physical explanation, we have carried out numerical simulations with the "Laboratoire de Météorologie Dynamique" Mars Global Climate Model. Using a "free dust" run initiated at $L_s \sim 210^\circ$ with the corresponding kriged map, but subsequently free of further CDOD forcing, we show that the model is able to account for some of the observed daily variability in CDOD. The model serves also to confirm that the use of the MY 34 daily-averaged dust scenario in a GCM produces results consistent with those obtained for the MY 25 GDE.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Do not Omit Local Minimizer: a Complete Solution for Pose Estimation from 3D Correspondences
Authors:
Lipu Zhou,
Shengze Wang,
Jiamin Ye,
Michael Kaess
Abstract:
Estimating pose from given 3D correspondences, including point-to-point, point-to-line and point-to-plane correspondences, is a fundamental task in computer vision with many applications. We present a complete solution for this task, including a solution for the minimal problem and the least-squares problem of this task. Previous works mainly focused on finding the global minimizer to address the…
▽ More
Estimating pose from given 3D correspondences, including point-to-point, point-to-line and point-to-plane correspondences, is a fundamental task in computer vision with many applications. We present a complete solution for this task, including a solution for the minimal problem and the least-squares problem of this task. Previous works mainly focused on finding the global minimizer to address the least-squares problem. However, existing works that show the ability to achieve global minimizer are still unsuitable for real-time applications. Furthermore, as one of contributions of this paper, we prove that there exist ambiguous configurations for any number of lines and planes. These configurations have several solutions in theory, which makes the correct solution may come from a local minimizer. Our algorithm is efficient and able to reveal local minimizers. We employ the Cayley-Gibbs-Rodriguez (CGR) parameterization of the rotation to derive a general rational cost for the three cases of 3D correspondences. The main contribution of this paper is to solve the resulting equation system of the minimal problem and the first-order optimality conditions of the least-squares problem, both of which are of complicated rational forms. The central idea of our algorithm is to introduce intermediate unknowns to simplify the problem. Extensive experimental results show that our algorithm significantly outperforms previous algorithms when the number of correspondences is small. Besides, when the global minimizer is the solution, our algorithm achieves the same accuracy as previous algorithms that have guaranteed global optimality, but our algorithm is applicable to real-time applications.
△ Less
Submitted 4 April, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Unsupervised Learning of Monocular Depth Estimation with Bundle Adjustment, Super-Resolution and Clip Loss
Authors:
Lipu Zhou,
Jiamin Ye,
Montiel Abello,
Shengze Wang,
Michael Kaess
Abstract:
We present a novel unsupervised learning framework for single view depth estimation using monocular videos. It is well known in 3D vision that enlarging the baseline can increase the depth estimation accuracy, and jointly optimizing a set of camera poses and landmarks is essential. In previous monocular unsupervised learning frameworks, only part of the photometric and geometric constraints within…
▽ More
We present a novel unsupervised learning framework for single view depth estimation using monocular videos. It is well known in 3D vision that enlarging the baseline can increase the depth estimation accuracy, and jointly optimizing a set of camera poses and landmarks is essential. In previous monocular unsupervised learning frameworks, only part of the photometric and geometric constraints within a sequence are used as supervisory signals. This may result in a short baseline and overfitting. Besides, previous works generally estimate a low resolution depth from a low resolution impute image. The low resolution depth is then interpolated to recover the original resolution. This strategy may generate large errors on object boundaries, as the depth of background and foreground are mixed to yield the high resolution depth. In this paper, we introduce a bundle adjustment framework and a super-resolution network to solve the above two problems. In bundle adjustment, depths and poses of an image sequence are jointly optimized, which increases the baseline by establishing the relationship between farther frames. The super resolution network learns to estimate a high resolution depth from a low resolution image. Additionally, we introduce the clip loss to deal with moving objects and occlusion. Experimental results on the KITTI dataset show that the proposed algorithm outperforms the state-of-the-art unsupervised methods using monocular sequences, and achieves comparable or even better result compared to unsupervised methods using stereo sequences.
△ Less
Submitted 8 December, 2018;
originally announced December 2018.
-
Robust Keyframe-based Dense SLAM with an RGB-D Camera
Authors:
Haomin Liu,
Chen Li,
Guojun Chen,
Guofeng Zhang,
Michael Kaess,
Hujun Bao
Abstract:
In this paper, we present RKD-SLAM, a robust keyframe-based dense SLAM approach for an RGB-D camera that can robustly handle fast motion and dense loop closure, and run without time limitation in a moderate size scene. It not only can be used to scan high-quality 3D models, but also can satisfy the demand of VR and AR applications. First, we combine color and depth information to construct a very…
▽ More
In this paper, we present RKD-SLAM, a robust keyframe-based dense SLAM approach for an RGB-D camera that can robustly handle fast motion and dense loop closure, and run without time limitation in a moderate size scene. It not only can be used to scan high-quality 3D models, but also can satisfy the demand of VR and AR applications. First, we combine color and depth information to construct a very fast keyframe-based tracking method on a CPU, which can work robustly in challenging cases (e.g.~fast camera motion and complex loops). For reducing accumulation error, we also introduce a very efficient incremental bundle adjustment (BA) algorithm, which can greatly save unnecessary computation and perform local and global BA in a unified optimization framework. An efficient keyframe-based depth representation and fusion method is proposed to generate and timely update the dense 3D surface with online correction according to the refined camera poses of keyframes through BA. The experimental results and comparisons on a variety of challenging datasets and TUM RGB-D benchmark demonstrate the effectiveness of the proposed system.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.