UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Zou, Heqing; Shen, Meng; Chen, Chen; Hu, Yuchen; Rajan, Deepu; Chng, Eng Siong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.09299 (cs)

[Submitted on 16 May 2023]

Title:UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Authors:Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

View PDF

Abstract:Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality relationship, treat each modality equally, suffer sensor noise, and thus reduce multimodal learning performance. In this work, we propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting. Specifically, we first capture task-related unimodal representations and the unimodal predictions from the introduced unimodal predicting task. Then the unimodal representations are aligned with the more effective one by the designed multimodal contrastive method under the supervision of the unimodal predictions. Experimental results with fused features on two image-text classification benchmarks UPMC-Food-101 and N24News show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods. The detailed ablation study and analysis further demonstrate the advantage of our proposed method.

Comments:	ACL 2023 Findings
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2305.09299 [cs.CV]
	(or arXiv:2305.09299v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.09299

Submission history

From: Heqing Zou [view email]
[v1] Tue, 16 May 2023 09:18:38 UTC (3,868 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators