PVT: Point-Voxel Transformer for 3D Deep Learning

Zhang, Cheng; Wan, Haocheng; Liu, Shengqiang; Shen, Xinyi; Wu, Zizhao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2108.06076v2 (cs)

[Submitted on 13 Aug 2021 (v1), revised 22 Sep 2021 (this version, v2), latest version 25 May 2022 (v4)]

Title:PVT: Point-Voxel Transformer for 3D Deep Learning

Authors:Cheng Zhang, Haocheng Wan, Shengqiang Liu, Xinyi Shen, Zizhao Wu

View PDF

Abstract:In this paper, we present an efficient and high-performance neural architecture, termed Point-Voxel Transformer (PVT)for 3D deep learning, which deeply integrates both 3D voxel-based and point-based self-attention computation to learn more discriminative features from 3D data. Specifically, we conduct multi-head self-attention (MSA) computation in voxels to obtain the efficient learning pattern and the coarse-grained local features while performing self-attention in points to provide finer-grained information about the global context. In addition, to reduce the cost of MSA computation with high efficiency, we design a cyclic shifted boxing scheme by limiting the MSA computation to non-overlapping local box and also preserving cross-box connection. Evaluated on classification benchmark, our method not only achieves state-of-the-art accuracy of 94.0% (no voting) but outperforms previous Transformer-based models with 7x measured speedup on average. On part and semantic segmentation, our model also obtains strong performance(86.5% and 68.2% mIoU, respectively). For 3D object detection task, we replace the primitives in Frustrum PointNet with PVT block and achieve an improvement of 8.6% AP.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Cite as:	arXiv:2108.06076 [cs.CV]
	(or arXiv:2108.06076v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2108.06076

Submission history

From: Cheng Zhang [view email]
[v1] Fri, 13 Aug 2021 06:07:57 UTC (836 KB)
[v2] Wed, 22 Sep 2021 05:17:40 UTC (2,203 KB)
[v3] Mon, 10 Jan 2022 13:59:37 UTC (2,650 KB)
[v4] Wed, 25 May 2022 06:34:21 UTC (2,649 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PVT: Point-Voxel Transformer for 3D Deep Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PVT: Point-Voxel Transformer for 3D Deep Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators