Mask-Attention-Free Transformer for 3D Instance Segmentation

Lai, Xin; Yuan, Yuhui; Chu, Ruihang; Chen, Yukang; Hu, Han; Jia, Jiaya

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.01692 (cs)

[Submitted on 4 Sep 2023]

Title:Mask-Attention-Free Transformer for 3D Instance Segmentation

Authors:Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia

View PDF

Abstract:Recently, transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved. Specifically, object queries are guided by the initial instance masks in the first cross-attention, and then iteratively refine themselves in a similar manner. However, we observe that the mask-attention pipeline usually leads to slow convergence due to low-recall initial instance masks. Therefore, we abandon the mask attention design and resort to an auxiliary center regression task instead. Through center regression, we effectively overcome the low-recall issue and perform cross-attention by imposing positional prior. To reach this goal, we develop a series of position-aware designs. First, we learn a spatial distribution of 3D locations as the initial position queries. They spread over the 3D space densely, and thus can easily capture the objects in a scene with a high recall. Moreover, we present relative position encoding for the cross-attention and iterative refinement for more accurate position queries. Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets. Code and models are available at this https URL.

Comments:	Accepted to ICCV 2023. Code and models are available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.01692 [cs.CV]
	(or arXiv:2309.01692v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.01692

Submission history

From: Xin Lai [view email]
[v1] Mon, 4 Sep 2023 16:09:28 UTC (24,704 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Mask-Attention-Free Transformer for 3D Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Mask-Attention-Free Transformer for 3D Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators