VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

Wang, Chia-Hung; Chen, Hsueh-Wei; Fu, Li-Chen

Computer Science > Computer Vision and Pattern Recognition

arXiv:2111.00966 (cs)

[Submitted on 1 Nov 2021]

Title:VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

Authors:Chia-Hung Wang, Hsueh-Wei Chen, Li-Chen Fu

View PDF

Abstract:Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform quite well. However, their performances of detecting small objects or under hard situations did not surpass those of the fusion-based ones due to failure to leverage the image semantics. In order to elevate the detection performance in a complicated environment, this paper proposes a deep learning (DL)-embedded fusion-based multi-class 3D object detection network which admits both LiDAR and camera sensor data streams, named Voxel-Pixel Fusion Network (VPFNet). Inside this network, a key novel component is called Voxel-Pixel Fusion (VPF) layer, which takes advantage of the geometric relation of a voxel-pixel pair and fuses the voxel features and the pixel features with proper mechanisms. Moreover, several parameters are particularly designed to guide and enhance the fusion effect after considering the characteristics of a voxel-pixel pair. Finally, the proposed method is evaluated on the KITTI benchmark for multi-class 3D object detection task under multilevel difficulty, and is shown to outperform all state-of-the-art methods in mean average precision (mAP). It is also noteworthy that our approach here ranks the first on the KITTI leaderboard for the challenging pedestrian class.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2111.00966 [cs.CV]
	(or arXiv:2111.00966v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2111.00966

Submission history

From: Chiahung Wang [view email]
[v1] Mon, 1 Nov 2021 14:17:09 UTC (777 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VPFNet: Voxel-Pixel Fusion Network for Multi-class 3D Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators