Search | arXiv e-print repository

RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet

Authors: Eliraz Orfaig, Inna Stainvas, Igal Bilik

Abstract: This work introduces RGBX-DiffusionDet, an object detection framework extending the DiffusionDet model to fuse the heterogeneous 2D data (X) with RGB imagery via an adaptive multimodal encoder. To enable cross-modal interaction, we design the dynamic channel reduction within a convolutional block attention module (DCR-CBAM), which facilitates cross-talk between subnetworks by dynamically highlight… ▽ More This work introduces RGBX-DiffusionDet, an object detection framework extending the DiffusionDet model to fuse the heterogeneous 2D data (X) with RGB imagery via an adaptive multimodal encoder. To enable cross-modal interaction, we design the dynamic channel reduction within a convolutional block attention module (DCR-CBAM), which facilitates cross-talk between subnetworks by dynamically highlighting salient channel features. Furthermore, the dynamic multi-level aggregation block (DMLAB) is proposed to refine spatial feature representations through adaptive multiscale fusion. Finally, novel regularization losses that enforce channel saliency and spatial selectivity are introduced, leading to compact and discriminative feature embeddings. Extensive experiments using RGB-Depth (KITTI), a novel annotated RGB-Polarimetric dataset, and RGB-Infrared (M$^3$FD) benchmark dataset were conducted. We demonstrate consistent superiority of the proposed approach over the baseline RGB-only DiffusionDet. The modular architecture maintains the original decoding complexity, ensuring efficiency. These results establish the proposed RGBX-DiffusionDet as a flexible multimodal object detection approach, providing new insights into integrating diverse 2D sensing modalities into diffusion-based detection pipelines. △ Less

Submitted 21 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

arXiv:2411.06410 [pdf, other]

SuperResolution Radar Gesture Recognitio

Authors: Netanel Blumenfeld, Inna Stainvas, Igal Bilik

Abstract: "This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible." Driver's interaction with a vehicle via automatic gesture recognition is expected to enhance driving safety by decreasing driver's distraction. Optical and infrared-based gesture recognition systems are limited by occlusions, poor l… ▽ More "This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible." Driver's interaction with a vehicle via automatic gesture recognition is expected to enhance driving safety by decreasing driver's distraction. Optical and infrared-based gesture recognition systems are limited by occlusions, poor lighting, and varying thermal conditions and, therefore, have limited performance in practical in-cabin applications. Radars are insensitive to lighting or thermal conditions and, therefore, are more suitable for in-cabin applications. However, the spatial resolution of conventional radars is insufficient for accurate gesture recognition. The main objective of this research is to derive an accurate gesture recognition approach using low-resolution radars with deep learning-based super-resolution processing. The main idea is to reconstruct high-resolution information from the radar's low-resolution measurements. The major challenge is the derivation of the real-time processing approach. The proposed approach combines conventional signal processing and deep learning methods. The radar echoes are arranged in 3D data cubes and processed using a super-resolution model to enhance range and Doppler resolution. The FFT is used to generate the range-Doppler maps, which enter the deep neural network for efficient gesture recognition. The preliminary results demonstrated the proposed approach's efficiency in achieving high gesture recognition performance using conventional low-resolution radars. △ Less

Submitted 23 November, 2024; v1 submitted 10 November, 2024; originally announced November 2024.

arXiv:2406.03129 [pdf, other]

Enhanced Automotive Object Detection via RGB-D Fusion in a DiffusionDet Framework

Authors: Eliraz Orfaig, Inna Stainvas, Igal Bilik

Abstract: Vision-based autonomous driving requires reliable and efficient object detection. This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data. Within this framework, ground truth bounding boxes are randomly reshaped as part of the training phase, allowing the model to learn the reverse diffusion pr… ▽ More Vision-based autonomous driving requires reliable and efficient object detection. This work proposes a DiffusionDet-based framework that exploits data fusion from the monocular camera and depth sensor to provide the RGB and depth (RGB-D) data. Within this framework, ground truth bounding boxes are randomly reshaped as part of the training phase, allowing the model to learn the reverse diffusion process of noise addition. The system methodically enhances a randomly generated set of boxes at the inference stage, guiding them toward accurate final detections. By integrating the textural and color features from RGB images with the spatial depth information from the LiDAR sensors, the proposed framework employs a feature fusion that substantially enhances object detection of automotive targets. The $2.3$ AP gain in detecting automotive targets is achieved through comprehensive experiments using the KITTI dataset. Specifically, the improved performance of the proposed approach in detecting small objects is demonstrated. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:1808.00366 [pdf, ps, other]

doi 10.1109/TAES.2016.140682

Pedestrian Motion Direction Estimation Using Simulated Automotive MIMO Radar

Authors: Petro Khomchuk, Inna Stainvas, Igal Bilik

Abstract: Micro-Doppler-based target classification capabilities of the automotive radars can provide high reliability and short latency to the future active safety automotive features. A large number of pedestrians surrounding vehicle in practical urban scenarios mandate prioritization of their treat level. Classification between relevant pedestrians that cross the street or are within the vehicle path and… ▽ More Micro-Doppler-based target classification capabilities of the automotive radars can provide high reliability and short latency to the future active safety automotive features. A large number of pedestrians surrounding vehicle in practical urban scenarios mandate prioritization of their treat level. Classification between relevant pedestrians that cross the street or are within the vehicle path and those that are on the sidewalks and move along the vehicle rout can significantly minimize a number of vehicle-to-pedestrian accidents. This work proposes a novel technique for a pedestrian direction of motion estimation which treats pedestrians as complex distributed targets and utilizes their micro-Doppler (MD) radar signatures. The MD signatures are shown to be indicative of pedestrian direction of motion, and the supervised regression is used to estimate the mapping between the directions of motion and the corresponding MD signatures. In order to achieve higher regression performance, the state of the art sparse dictionary learning based feature extraction algorithm was adopted from the field of computer vision by drawing a parallel between the Doppler effect and the video temporal gradient. The performance of the proposed approach is evaluated in a practical automotive scenario simulations, where a walking pedestrian is observed by a multiple-input-multiple-output (MIMO) automotive radar with a 2D rectangular array. The simulated data was generated using the statistical Boulic-Thalman human locomotion model. Accurate direction of motion estimation was achieved by using a support vector regression (SVR) and a multilayer perceptron (MLP) based regression algorithms. The results show that the direction estimation error is less than $10^{\circ}$ in $95\%$ of the tested cases, for pedestrian at the range of $100$m from the radar. △ Less

Submitted 1 August, 2018; originally announced August 2018.

Journal ref: P. Khomchuk, I. Stainvas, I. Bilik, "Pedestrian motion direction estimation using automotive MIMO radar", IEEE Transactions on Aerospace and Electronic Systems, 52.3 (2016): 1132-1145

arXiv:1412.2873 [pdf, ps, other]

Cancer Detection with Multiple Radiologists via Soft Multiple Instance Logistic Regression and $L_1$ Regularization

Authors: Inna Stainvas, Alexandra Manevitch, Isaac Leichter

Abstract: This paper deals with the multiple annotation problem in medical application of cancer detection in digital images. The main assumption is that though images are labeled by many experts, the number of images read by the same expert is not large. Thus differing with the existing work on modeling each expert and ground truth simultaneously, the multi annotation information is used in a soft manner.… ▽ More This paper deals with the multiple annotation problem in medical application of cancer detection in digital images. The main assumption is that though images are labeled by many experts, the number of images read by the same expert is not large. Thus differing with the existing work on modeling each expert and ground truth simultaneously, the multi annotation information is used in a soft manner. The multiple labels from different experts are used to estimate the probability of the findings to be marked as malignant. The learning algorithm minimizes the Kullback Leibler (KL) divergence between the modeled probabilities and desired ones constraining the model to be compact. The probabilities are modeled by logit regression and multiple instance learning concept is used by us. Experiments on a real-life computer aided diagnosis (CAD) problem for CXR CAD lung cancer detection demonstrate that the proposed algorithm leads to similar results as learning with a binary RVMMIL classifier or a mixture of binary RVMMIL models per annotator. However, this model achieves a smaller complexity and is more preferable in practice. △ Less

Submitted 9 December, 2014; originally announced December 2014.

Comments: 20 pages, report

Showing 1–5 of 5 results for author: Stainvas, I