Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Pian, Weiguo; Deng, Shijian; Mo, Shentong; Guo, Yunhui; Tian, Yapeng

Computer Science > Machine Learning

arXiv:2412.13050 (cs)

[Submitted on 17 Dec 2024]

Title:Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Authors:Weiguo Pian, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian

View PDF HTML (experimental)

Abstract:In this paper, we introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal Large Language Models (MLLMs) that involves tasks with inconsistent modalities (image, audio, or video) and varying task types (captioning or question-answering). Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting. To address these challenges, we propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities. It also incorporates Instruction-based Knowledge Distillation to preserve the model's ability to handle previously learned modalities when new ones are introduced. We benchmark MICL using a total of six tasks and conduct experiments to validate the effectiveness of our proposed MoInCL. The experimental results highlight the superiority of MoInCL, showing significant improvements over representative and state-of-the-art continual learning baselines.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2412.13050 [cs.LG]
	(or arXiv:2412.13050v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.13050

Submission history

From: Weiguo Pian [view email]
[v1] Tue, 17 Dec 2024 16:13:56 UTC (5,545 KB)

Computer Science > Machine Learning

Title:Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators