CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Taghavi, Pardis; Liu, Tian; Li, Renjie; Langari, Reza; Tu, Zhengzhong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.21904 (cs)

[Submitted on 28 May 2025 (v1), last revised 8 Jun 2025 (this version, v3)]

Title:CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Authors:Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu

View PDF HTML (experimental)

Abstract:Instance segmentation demands costly per-pixel annotations and large models. We introduce CAST, a semi-supervised knowledge distillation (SSKD) framework that compresses pretrained vision foundation models (VFM) into compact experts using limited labeled and abundant unlabeled data. CAST unfolds in three stages: (1) domain adaptation of the VFM teacher(s) via self-training with contrastive pixel calibration, (2) distillation into a compact student via a unified multi-objective loss that couples standard supervision and pseudo-labels with our instance-aware pixel-wise contrastive term, and (3) fine-tuning on labeled data to remove residual pseudo-label bias. Central to CAST is an \emph{instance-aware pixel-wise contrastive loss} that fuses mask and class scores to mine informative negatives and enforce clear inter-instance margins. By maintaining this contrastive signal across both adaptation and distillation, we align teacher and student embeddings and fully leverage unlabeled images. On Cityscapes and ADE20K, our ~11X smaller student surpasses its adapted VFM teacher(s) by +3.4 AP (33.9 vs. 30.5) and +1.5 AP (16.7 vs. 15.2) and outperforms state-of-the-art semi-supervised approaches.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2505.21904 [cs.CV]
	(or arXiv:2505.21904v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.21904

Submission history

From: Pardis Taghavi [view email]
[v1] Wed, 28 May 2025 02:45:42 UTC (23,918 KB)
[v2] Thu, 29 May 2025 01:20:59 UTC (4,681 KB)
[v3] Sun, 8 Jun 2025 03:09:16 UTC (4,681 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators