Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Wang, Xinkun; Wang, Yifang; Liang, Senwei; Tang, Feilong; Liu, Chengzhi; Hu, Ming; Hu, Chao; He, Junjun; Ge, Zongyuan; Razzak, Imran

Computer Science > Computer Vision and Pattern Recognition

arXiv:2503.05319 (cs)

[Submitted on 7 Mar 2025 (v1), last revised 25 Jun 2025 (this version, v2)]

Title:Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Authors:Xinkun Wang, Yifang Wang, Senwei Liang, Feilong Tang, Chengzhi Liu, Ming Hu, Chao Hu, Junjun He, Zongyuan Ge, Imran Razzak

View PDF HTML (experimental)

Abstract:This paper discusses how ophthalmologists often rely on multimodal data to improve diagnostic accuracy. However, complete multimodal data is rare in real-world applications due to a lack of medical equipment and concerns about data privacy. Traditional deep learning methods typically address these issues by learning representations in latent space. However, the paper highlights two key limitations of these approaches: (i) Task-irrelevant redundant information (e.g., numerous slices) in complex modalities leads to significant redundancy in latent space representations. (ii) Overlapping multimodal representations make it difficult to extract unique features for each modality. To overcome these challenges, the authors propose the Essence-Point and Disentangle Representation Learning (EDRL) strategy, which integrates a self-distillation mechanism into an end-to-end framework to enhance feature selection and disentanglement for more robust multimodal learning. Specifically, the Essence-Point Representation Learning module selects discriminative features that improve disease grading performance. The Disentangled Representation Learning module separates multimodal data into modality-common and modality-unique representations, reducing feature entanglement and enhancing both robustness and interpretability in ophthalmic disease diagnosis. Experiments on multimodal ophthalmology datasets show that the proposed EDRL strategy significantly outperforms current state-of-the-art methods.

Comments:	10pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2503.05319 [cs.CV]
	(or arXiv:2503.05319v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2503.05319

Submission history

From: Xinkun Wang [view email]
[v1] Fri, 7 Mar 2025 10:58:38 UTC (3,056 KB)
[v2] Wed, 25 Jun 2025 03:53:34 UTC (1,637 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators