SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

Simons, Cody; Raychaudhuri, Dripta S.; Ahmed, Sk Miraj; You, Suya; Karydis, Konstantinos; Roy-Chowdhury, Amit K.

Abstract:Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at this https URL.

Comments:	12 pages, 5 figures, 9 tables, ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2308.11880 [cs.CV]
	(or arXiv:2308.11880v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.11880

Computer Science > Computer Vision and Pattern Recognition

Title:SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators