Search | arXiv e-print repository

BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation

Authors: Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, Patrick Vandewalle

Abstract: Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined s… ▽ More Large-scale 2D datasets have been instrumental in advancing machine learning; however, progress in 3D vision tasks has been relatively slow. This disparity is largely due to the limited availability of 3D benchmarking datasets. In particular, creating real-world point cloud datasets for indoor scene semantic segmentation presents considerable challenges, including data collection within confined spaces and the costly, often inaccurate process of per-point labeling to generate ground truths. While synthetic datasets address some of these challenges, they often fail to replicate real-world conditions, particularly the occlusions that occur in point clouds collected from real environments. Existing 3D benchmarking datasets typically evaluate deep learning models under the assumption that training and test data are independently and identically distributed (IID), which affects the models' usability for real-world point cloud segmentation. To address these challenges, we introduce the BelHouse3D dataset, a new synthetic point cloud dataset designed for 3D indoor scene semantic segmentation. This dataset is constructed using real-world references from 32 houses in Belgium, ensuring that the synthetic data closely aligns with real-world conditions. Additionally, we include a test set with data occlusion to simulate out-of-distribution (OOD) scenarios, reflecting the occlusions commonly encountered in real-world point clouds. We evaluate popular point-based semantic segmentation methods using our OOD setting and present a benchmark. We believe that BelHouse3D and its OOD setting will advance research in 3D point cloud semantic segmentation for indoor scenes, providing valuable insights for the development of more generalizable models. △ Less

Submitted 20 November, 2024; originally announced November 2024.

Comments: 20 pages, 6 figures, 3 tables, accepted at ECCV 2024 Workshops

arXiv:2408.08432 [pdf, other]

Predictive uncertainty estimation in deep learning for lung carcinoma classification in digital pathology under real dataset shifts

Authors: Abdur R. Fayjie, Jutika Borah, Florencia Carbone, Jan Tack, Patrick Vandewalle

Abstract: Deep learning has shown tremendous progress in a wide range of digital pathology and medical image classification tasks. Its integration into safe clinical decision-making support requires robust and reliable models. However, real-world data comes with diversities that often lie outside the intended source distribution. Moreover, when test samples are dramatically different, clinical decision-maki… ▽ More Deep learning has shown tremendous progress in a wide range of digital pathology and medical image classification tasks. Its integration into safe clinical decision-making support requires robust and reliable models. However, real-world data comes with diversities that often lie outside the intended source distribution. Moreover, when test samples are dramatically different, clinical decision-making is greatly affected. Quantifying predictive uncertainty in models is crucial for well-calibrated predictions and determining when (or not) to trust a model. Unfortunately, many works have overlooked the importance of predictive uncertainty estimation. This paper evaluates whether predictive uncertainty estimation adds robustness to deep learning-based diagnostic decision-making systems. We investigate the effect of various carcinoma distribution shift scenarios on predictive performance and calibration. We first systematically investigate three popular methods for improving predictive uncertainty: Monte Carlo dropout, deep ensemble, and few-shot learning on lung adenocarcinoma classification as a primary disease in whole slide images. Secondly, we compare the effectiveness of the methods in terms of performance and calibration under clinically relevant distribution shifts such as in-distribution shifts comprising primary disease sub-types and other characterization analysis data; out-of-distribution shifts comprising well-differentiated cases, different organ origin, and imaging modality shifts. While studies on uncertainty estimation exist, to our best knowledge, no rigorous large-scale benchmark compares predictive uncertainty estimation including these dataset shifts for lung carcinoma classification. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 17 pages, 2 figures, 5 tables

arXiv:2402.16598 [pdf, other]

PCR-99: A Practical Method for Point Cloud Registration with 99 Percent Outliers

Authors: Seong Hun Lee, Javier Civera, Patrick Vandewalle

Abstract: We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be i… ▽ More We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed. △ Less

Submitted 9 September, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to ECCV 2024 Workshop on Recovering 6D Object Pose (R6D)

arXiv:2310.11178 [pdf, other]

doi 10.1007/978-981-96-0348-0_20

FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack

Authors: Xueyang Kang, Fengze Han, Abdur R. Fayjie, Patrick Vandewalle, Kourosh Khoshelham, Dong Gong

Abstract: Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ab… ▽ More Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ability to generalize to stacks of arbitrary lengths. To overcome these limitations, we present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder. The Transformer's self-attention mechanism allows for the learning of more informative spatial features by implicitly performing non-local cross-referencing. The LSTM module is designed to integrate representations across image stacks of varying lengths. Additionally, we employ multi-scale convolutional kernels in an early-stage encoder to capture low-level features at different degrees of focus/defocus. By incorporating the LSTM, FocDepthFormer can be pre-trained on large-scale monocular RGB depth estimation datasets, improving visual pattern learning and reducing reliance on difficult-to-obtain focal stack data. Extensive experiments on diverse focal stack benchmark datasets demonstrate that our model outperforms state-of-the-art approaches across multiple evaluation metrics. △ Less

Submitted 3 December, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 30 pages, 20 figures, Conference paper

arXiv:2211.02432 [pdf, other]

RCDPT: Radar-Camera fusion Dense Prediction Transformer

Authors: Chen-Chou Lo, Patrick Vandewalle

Abstract: Recently, transformer networks have outperformed traditional deep neural networks in natural language processing and show a large potential in many computer vision tasks compared to convolutional backbones. In the original transformer, readout tokens are used as designated vectors for aggregating information from other tokens. However, the performance of using readout tokens in a vision transforme… ▽ More Recently, transformer networks have outperformed traditional deep neural networks in natural language processing and show a large potential in many computer vision tasks compared to convolutional backbones. In the original transformer, readout tokens are used as designated vectors for aggregating information from other tokens. However, the performance of using readout tokens in a vision transformer is limited. Therefore, we propose a novel fusion strategy to integrate radar data into a dense prediction transformer network by reassembling camera representations with radar representations. Instead of using readout tokens, radar representations contribute additional depth information to a monocular depth estimation model and improve performance. We further investigate different fusion approaches that are commonly used for integrating additional modality in a dense prediction transformer network. The experiments are conducted on the nuScenes dataset, which includes camera images, lidar, and radar data. The results show that our proposed method yields better performance than the commonly used fusion strategies and outperforms existing convolutional depth estimation models that fuse camera images and radar. △ Less

Submitted 2 March, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: 5 pages, 2 figures and 1 table, accepted to ICASSP2023

arXiv:2206.10981 [pdf, other]

Adaptive Sampling-based Particle Filter for Visual-inertial Gimbal in the Wild

Authors: Xueyang Kang, Ariel Herrera, Henry Lema, Esteban Valencia, Patrick Vandewalle

Abstract: In this paper, we present a Computer Vision (CV) based tracking and fusion algorithm, dedicated to a 3D printed gimbal system on drones operating in nature. The whole gimbal system can stabilize the camera orientation robustly in a challenging nature scenario by using skyline and ground plane as references. Our main contributions are the following: a) a light-weight Resnet-18 backbone network mode… ▽ More In this paper, we present a Computer Vision (CV) based tracking and fusion algorithm, dedicated to a 3D printed gimbal system on drones operating in nature. The whole gimbal system can stabilize the camera orientation robustly in a challenging nature scenario by using skyline and ground plane as references. Our main contributions are the following: a) a light-weight Resnet-18 backbone network model was trained from scratch, and deployed onto the Jetson Nano platform to segment the image into binary parts (ground and sky); b) our geometry assumption from nature cues delivers the potential for robust visual tracking by using the skyline and ground plane as a reference; c) a spherical surface-based adaptive particle sampling, can fuse orientation from multiple sensor sources flexibly. The whole algorithm pipeline is tested on our customized gimbal module including Jetson and other hardware components. The experiments were performed on top of a building in the real landscape. △ Less

Submitted 8 January, 2024; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: content in 6 pages, 9 figures, 2 pseudo codes, one table, accepted by ICRA 2023

MSC Class: 68T45; 57-06; 60B05 ACM Class: I.4.6; I.2.10

arXiv:2202.13220 [pdf, other]

doi 10.2352/EI.2023.35.16.AVM-122

How Much Depth Information can Radar Contribute to a Depth Estimation Model?

Authors: Chen-Chou Lo, Patrick Vandewalle

Abstract: Recently, several works have proposed fusing radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against varying light and weather conditions. Although improved performances were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar… ▽ More Recently, several works have proposed fusing radar data as an additional perceptual signal into monocular depth estimation models because radar data is robust against varying light and weather conditions. Although improved performances were reported in prior works, it is still hard to tell how much depth information radar can contribute to a depth estimation model. In this paper, we propose radar inference and supervision experiments to investigate the intrinsic depth potential of radar data using state-of-the-art depth estimation models on the nuScenes dataset. In the inference experiment, the model predicts depth by taking only radar as input to demonstrate the inference capability using radar data. In the supervision experiment, a monocular depth estimation model is trained under radar supervision to show the intrinsic depth information that radar can contribute. Our experiments demonstrate that the model using only sparse radar as input can detect the shape of surroundings to a certain extent in the predicted depth. Furthermore, the monocular depth estimation model supervised by preprocessed radar achieves a good performance compared to the baseline model trained with sparse lidar supervision. △ Less

Submitted 15 March, 2023; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: published on EI2023, 7 pages, 4 figures, 2 tables

arXiv:2107.07596 [pdf, other]

doi 10.1109/ICIP42928.2021.9506550

Depth Estimation from Monocular Images and Sparse radar using Deep Ordinal Regression Network

Authors: Chen-Chou Lo, Patrick Vandewalle

Abstract: We integrate sparse radar data into a monocular depth estimation model and introduce a novel preprocessing method for reducing the sparseness and limited field of view provided by radar. We explore the intrinsic error of different radar modalities and show our proposed method results in more data points with reduced error. We further propose a novel method for estimating dense depth maps from mono… ▽ More We integrate sparse radar data into a monocular depth estimation model and introduce a novel preprocessing method for reducing the sparseness and limited field of view provided by radar. We explore the intrinsic error of different radar modalities and show our proposed method results in more data points with reduced error. We further propose a novel method for estimating dense depth maps from monocular 2D images and sparse radar measurements using deep learning based on the deep ordinal regression network by Fu et al. Radar data are integrated by first converting the sparse 2D points to a height-extended 3D measurement and then including it into the network using a late fusion approach. Experiments are conducted on the nuScenes dataset. Our experiments demonstrate state-of-the-art performance in both day and night scenes. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: Accepted to ICIP2021

Showing 1–8 of 8 results for author: Vandewalle, P