-
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation
Authors:
Umamaheswaran Raman Kumar,
Abdur Razzaq Fayjie,
Jurgen Hannaert,
Patrick Vandewalle
Abstract:
Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined s…
▽ More
Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined spaces and the costly, often inaccurate process of per-point labeling to generate ground truths. While synthetic datasets address some of these challenges, they often fail to replicate real-world conditions, particularly the occlusions that occur in point clouds collected from real environments. Existing 3D benchmarking datasets typically evaluate deep learning models under the assumption that training and test data are independently and identically distributed (IID), which affects the models' usability for real-world point cloud segmentation. To address these challenges, we introduce the BelHouse3D dataset, a new synthetic point cloud dataset designed for 3D indoor scene semantic segmentation. This dataset is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions. Additionally, we include a test set with data occlusion to simulate out-of-distribution (OOD) scenarios, reflecting the occlusions commonly encountered in real-world point clouds. We evaluate popular point-based semantic segmentation methods using our OOD setting and present a benchmark. We believe that BelHouse3D and its OOD setting will advance research in 3D point cloud semantic segmentation for indoor scenes, providing valuable insights for the development of more generalizable models.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Predictive uncertainty estimation in deep learning for lung carcinoma classification in digital pathology under real dataset shifts
Authors:
Abdur R. Fayjie,
Jutika Borah,
Florencia Carbone,
Jan Tack,
Patrick Vandewalle
Abstract:
Deep learning has shown tremendous progress in a wide range of digital pathology and medical image classification tasks. Its integration into safe clinical decision-making support requires robust and reliable models. However, real-world data comes with diversities that often lie outside the intended source distribution. Moreover, when test samples are dramatically different, clinical decision-maki…
▽ More
Deep learning has shown tremendous progress in a wide range of digital pathology and medical image classification tasks. Its integration into safe clinical decision-making support requires robust and reliable models. However, real-world data comes with diversities that often lie outside the intended source distribution. Moreover, when test samples are dramatically different, clinical decision-making is greatly affected. Quantifying predictive uncertainty in models is crucial for well-calibrated predictions and determining when (or not) to trust a model. Unfortunately, many works have overlooked the importance of predictive uncertainty estimation. This paper evaluates whether predictive uncertainty estimation adds robustness to deep learning-based diagnostic decision-making systems. We investigate the effect of various carcinoma distribution shift scenarios on predictive performance and calibration. We first systematically investigate three popular methods for improving predictive uncertainty: Monte Carlo dropout, deep ensemble, and few-shot learning on lung adenocarcinoma classification as a primary disease in whole slide images. Secondly, we compare the effectiveness of the methods in terms of performance and calibration under clinically relevant distribution shifts such as in-distribution shifts comprising primary disease sub-types and other characterization analysis data; out-of-distribution shifts comprising well-differentiated cases, different organ origin, and imaging modality shifts. While studies on uncertainty estimation exist, to our best knowledge, no rigorous large-scale benchmark compares predictive uncertainty estimation including these dataset shifts for lung carcinoma classification.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
PCR-99: A Practical Method for Point Cloud Registration with 99 Percent Outliers
Authors:
Seong Hun Lee,
Javier Civera,
Patrick Vandewalle
Abstract:
We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be i…
▽ More
We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.
△ Less
Submitted 9 September, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack
Authors:
Xueyang Kang,
Fengze Han,
Abdur R. Fayjie,
Patrick Vandewalle,
Kourosh Khoshelham,
Dong Gong
Abstract:
Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ab…
▽ More
Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ability to generalize to stacks of arbitrary lengths. To overcome these limitations, we present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder. The Transformer's self-attention mechanism allows for the learning of more informative spatial features by implicitly performing non-local cross-referencing. The LSTM module is designed to integrate representations across image stacks of varying lengths. Additionally, we employ multi-scale convolutional kernels in an early-stage encoder to capture low-level features at different degrees of focus/defocus. By incorporating the LSTM, FocDepthFormer can be pre-trained on large-scale monocular RGB depth estimation datasets, improving visual pattern learning and reducing reliance on difficult-to-obtain focal stack data. Extensive experiments on diverse focal stack benchmark datasets demonstrate that our model outperforms state-of-the-art approaches across multiple evaluation metrics.
△ Less
Submitted 3 December, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
RCDPT: Radar-Camera fusion Dense Prediction Transformer
Authors:
Chen-Chou Lo,
Patrick Vandewalle
Abstract:
Recently, transformer networks have outperformed traditional deep neural networks in natural language processing and show a large potential in many computer vision tasks compared to convolutional backbones. In the original transformer, readout tokens are used as designated vectors for aggregating information from other tokens. However, the performance of using readout tokens in a vision transforme…
▽ More
Recently, transformer networks have outperformed traditional deep neural networks in natural language processing and show a large potential in many computer vision tasks compared to convolutional backbones. In the original transformer, readout tokens are used as designated vectors for aggregating information from other tokens. However, the performance of using readout tokens in a vision transformer is limited. Therefore, we propose a novel fusion strategy to integrate radar data into a dense prediction transformer network by reassembling camera representations with radar representations. Instead of using readout tokens, radar representations contribute additional depth information to a monocular depth estimation model and improve performance. We further investigate different fusion approaches that are commonly used for integrating additional modality in a dense prediction transformer network. The experiments are conducted on the nuScenes dataset, which includes camera images, lidar, and radar data. The results show that our proposed method yields better performance than the commonly used fusion strategies and outperforms existing convolutional depth estimation models that fuse camera images and radar.
△ Less
Submitted 2 March, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Adaptive Sampling-based Particle Filter for Visual-inertial Gimbal in the Wild
Authors:
Xueyang Kang,
Ariel Herrera,
Henry Lema,
Esteban Valencia,
Patrick Vandewalle
Abstract:
In this paper, we present a Computer Vision (CV) based tracking and fusion algorithm, dedicated to a 3D printed gimbal system on drones operating in nature. The whole gimbal system can stabilize the camera orientation robustly in a challenging nature scenario by using skyline and ground plane as references. Our main contributions are the following: a) a light-weight Resnet-18 backbone network mode…
▽ More
In this paper, we present a Computer Vision (CV) based tracking and fusion algorithm, dedicated to a 3D printed gimbal system on drones operating in nature. The whole gimbal system can stabilize the camera orientation robustly in a challenging nature scenario by using skyline and ground plane as references. Our main contributions are the following: a) a light-weight Resnet-18 backbone network model was trained from scratch, and deployed onto the Jetson Nano platform to segment the image into binary parts (ground and sky); b) our geometry assumption from nature cues delivers the potential for robust visual tracking by using the skyline and ground plane as a reference; c) a spherical surface-based adaptive particle sampling, can fuse orientation from multiple sensor sources flexibly. The whole algorithm pipeline is tested on our customized gimbal module including Jetson and other hardware components. The experiments were performed on top of a building in the real landscape.
△ Less
Submitted 8 January, 2024; v1 submitted 22 June, 2022;
originally announced June 2022.
-
How Much Depth Information can Radar Contribute to a Depth Estimation Model?
Authors:
Chen-Chou Lo,
Patrick Vandewalle
Abstract:
Recently, several works have proposed fusing radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against varying light and weather conditions. Although improved performances were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar…
▽ More
Recently, several works have proposed fusing radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against varying light and weather conditions. Although improved performances were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar inference and supervision experiments to investigate the intrinsic depth potential of radar data using state-of-the-art depth estimation models on the nuScenes dataset. In the inference experiment, the model predicts depth by taking only radar as input to demonstrate the inference capability using radar data. In the supervision experiment, a monocular depth estimation model is trained under radar supervision to show the intrinsic depth information that radar can contribute. Our experiments demonstrate that the model using only sparse radar as input can detect the shape of surroundings to a certain extent in the predicted depth. Furthermore, the monocular depth estimation model supervised by preprocessed radar achieves a good performance compared to the baseline model trained with sparse lidar supervision.
△ Less
Submitted 15 March, 2023; v1 submitted 26 February, 2022;
originally announced February 2022.
-
Depth Estimation from Monocular Images and Sparse radar using Deep Ordinal Regression Network
Authors:
Chen-Chou Lo,
Patrick Vandewalle
Abstract:
We integrate sparse radar data into a monocular depth estimation model and introduce a novel preprocessing method for reducing the sparseness and limited field of view provided by radar. We explore the intrinsic error of different radar modalities and show our proposed method results in more data points with reduced error. We further propose a novel method for estimating dense depth maps from mono…
▽ More
We integrate sparse radar data into a monocular depth estimation model and introduce a novel preprocessing method for reducing the sparseness and limited field of view provided by radar. We explore the intrinsic error of different radar modalities and show our proposed method results in more data points with reduced error. We further propose a novel method for estimating dense depth maps from monocular 2D images and sparse radar measurements using deep learning based on the deep ordinal regression network by Fu et al. Radar data are integrated by first converting the sparse 2D points to a height-extended 3D measurement and then including it into the network using a late fusion approach. Experiments are conducted on the nuScenes dataset. Our experiments demonstrate state-of-the-art performance in both day and night scenes.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.