Semi-supervised Vision Transformers at Scale

Cai, Zhaowei; Ravichandran, Avinash; Favaro, Paolo; Wang, Manchen; Modolo, Davide; Bhotika, Rahul; Tu, Zhuowen; Soatto, Stefano

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.05688 (cs)

[Submitted on 11 Aug 2022]

Title:Semi-supervised Vision Transformers at Scale

Authors:Zhaowei Cai, Avinash Ravichandran, Paolo Favaro, Manchen Wang, Davide Modolo, Rahul Bhotika, Zhuowen Tu, Stefano Soatto

View PDF

Abstract:We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2208.05688 [cs.CV]
	(or arXiv:2208.05688v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.05688

Submission history

From: Zhaowei Cai [view email]
[v1] Thu, 11 Aug 2022 08:11:54 UTC (441 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semi-supervised Vision Transformers at Scale

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semi-supervised Vision Transformers at Scale

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators