Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Joshi, Abhinav; Gupta, Naman; Shah, Jinang; Bhattarai, Binod; Modi, Ashutosh; Stoyanov, Danail

Computer Science > Computer Vision and Pattern Recognition

arXiv:2211.03587 (cs)

[Submitted on 7 Nov 2022]

Title:Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Authors:Abhinav Joshi, Naman Gupta, Jinang Shah, Binod Bhattarai, Ashutosh Modi, Danail Stoyanov

View PDF

Abstract:A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneous sources and fusing them. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality, and subsequently, the contribution from each modality is dynamically varied while estimating the joint distribution. We evaluate our method on two challenging benchmarks from two diverse domains: multimodal 3D hand-pose estimation and multimodal surgical video segmentation. We attain state-of-the-art performance on both benchmarks. Our extensive quantitative and qualitative evaluations show the advantages of our method compared to previous approaches.

Comments:	11 Pages, Accepted at ICMI 2022 Oral
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2211.03587 [cs.CV]
	(or arXiv:2211.03587v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2211.03587

Submission history

From: Ashutosh Modi [view email]
[v1] Mon, 7 Nov 2022 14:27:38 UTC (5,226 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators