Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

Zhang, Zhenghao; Shao, Fangtao; Dai, Zuozhuo; Zhu, Siyu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2301.09416 (cs)

[Submitted on 20 Jan 2023]

Title:Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

Authors:Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu

View PDF

Abstract:Most existing transformer based video instance segmentation methods extract per frame features independently, hence it is challenging to solve the appearance deformation problem. In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder. Specifically, in transformer encoder, we propose a novel spatio-temporal joint multi-scale deformable attention module which dynamically integrates the spatial and temporal information to obtain enriched spatio-temporal features. In transformer decoder, we introduce a temporal self-attention module to enhance the frame level box queries with the temporal relation. Moreover, TAFormer adopts an instance level contrastive loss to increase the discriminability of instance query embeddings. Therefore the tracking error caused by visually similar instances can be decreased. Experimental results show that TAFormer effectively leverages the spatial and temporal information to obtain context-aware feature representation and outperforms state-of-the-art methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2301.09416 [cs.CV]
	(or arXiv:2301.09416v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2301.09416

Submission history

From: Zhenghao Zhang [view email]
[v1] Fri, 20 Jan 2023 05:22:16 UTC (25,791 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators