-
No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory
Authors:
Imad Eddine Marouf,
Enzo Tartaglione,
Stephane Lathuiliere,
Joost van de Weijer
Abstract:
Continual Learning in Visual Question Answering (VQACL) requires models to learn new visual-linguistic tasks (plasticity) while retaining knowledge from previous tasks (stability). The multimodal nature of VQACL presents unique challenges, requiring models to balance stability across visual and textual domains while maintaining plasticity to adapt to novel objects and reasoning tasks. Existing met…
▽ More
Continual Learning in Visual Question Answering (VQACL) requires models to learn new visual-linguistic tasks (plasticity) while retaining knowledge from previous tasks (stability). The multimodal nature of VQACL presents unique challenges, requiring models to balance stability across visual and textual domains while maintaining plasticity to adapt to novel objects and reasoning tasks. Existing methods, predominantly designed for unimodal tasks, often struggle to balance these demands effectively. In this work, we introduce QUestion-only replay with Attention Distillation (QUAD), a novel approach for VQACL that leverages only past task questions for regularisation, eliminating the need to store visual data and addressing both memory and privacy concerns. QUAD achieves stability by introducing a question-only replay mechanism that selectively uses questions from previous tasks to prevent overfitting to the current task's answer space, thereby mitigating the out-of-answer-set problem. Complementing this, we propose attention consistency distillation, which uniquely enforces both intra-modal and inter-modal attention consistency across tasks, preserving essential visual-linguistic associations. Extensive experiments on VQAv2 and NExT-QA demonstrate that QUAD significantly outperforms state-of-the-art methods, achieving robust performance in continual VQA.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Weighted Ensemble Models Are Strong Continual Learners
Authors:
Imad Eddine Marouf,
Subhankar Roy,
Enzo Tartaglione,
Stéphane Lathuilière
Abstract:
In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stab…
▽ More
In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.
△ Less
Submitted 11 December, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Mini but Mighty: Finetuning ViTs with Mini Adapters
Authors:
Imad Eddine Marouf,
Enzo Tartaglione,
Stéphane Lathuilière
Abstract:
Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimen…
▽ More
Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation
Authors:
Imad Eddine Marouf,
Subhankar Roy,
Enzo Tartaglione,
Stéphane Lathuilière
Abstract:
Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks without forgetting previously learned information. The advent of large pre-trained models (PTMs) has fast-tracked the progress in CIL due to the highly transferable PTM representations, where tuning a small set of parameters leads to state-of-the-art performance when comp…
▽ More
Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks without forgetting previously learned information. The advent of large pre-trained models (PTMs) has fast-tracked the progress in CIL due to the highly transferable PTM representations, where tuning a small set of parameters leads to state-of-the-art performance when compared with the traditional CIL methods that are trained from scratch. However, repeated fine-tuning on each task destroys the rich representations of the PTMs and further leads to forgetting previous tasks. To strike a balance between the stability and plasticity of PTMs for CIL, we propose a novel perspective of eliminating training on every new task and instead train PTM only on the first task, and then refine its representation at inference time using test-time adaptation (TTA). Concretely, we propose Test-Time Adaptation for Class-Incremental Learning (TTACIL) that first fine-tunes PTMs using Adapters on the first task, then adjusts Layer Norm parameters of the PTM on each test instance for learning task-specific features, and finally resets them back to the adapted model to preserve stability. As a consequence, our TTACIL does not undergo any forgetting, while benefiting each task with the rich PTM features. Additionally, by design, our TTACIL is robust to common data corruptions. Our method outperforms several state-of-the-art CIL methods when evaluated on multiple CIL benchmarks under both clean and corrupted data. Code is available at: https://github.com/IemProg/TTACIL.
△ Less
Submitted 14 March, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.