Skip to main content

Showing 1–3 of 3 results for author: Mistretta, M

.
  1. arXiv:2502.04263  [pdf, other

    cs.CV cs.AI cs.LG

    Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

    Authors: Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Andrew D. Bagdanov

    Abstract: Pre-trained multi-modal Vision-Language Models like CLIP are widely used off-the-shelf for a variety of applications. In this paper, we show that the common practice of individually exploiting the text or image encoders of these powerful multi-modal models is highly suboptimal for intra-modal tasks like image-to-image retrieval. We argue that this is inherently due to the CLIP-style inter-modal co… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted for publication at ICLR 2025

  2. arXiv:2410.17827  [pdf, other

    cs.AI

    RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification

    Authors: Marco Mistretta, Andrew D. Bagdanov

    Abstract: In this paper we introduce RE-tune, a novel approach for fine-tuning pre-trained Multimodal Biomedical Vision-Language models (VLMs) in Incremental Learning scenarios for multi-label chest disease diagnosis. RE-tune freezes the backbones and only trains simple adaptors on top of the Image and Text encoders of the VLM. By engineering positive and negative text prompts for diseases, we leverage the… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted for publication at Medical Imaging meets NeurIPS (NeurIPS23)

  3. arXiv:2407.03056  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

    Authors: Marco Mistretta, Alberto Baldrati, Marco Bertini, Andrew D. Bagdanov

    Abstract: Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited data. Prompt learning is emerging as a parameter-efficient method for adapting VLMs, but state-of-the-art approaches require annotated samples. In this paper we propose a novel approach to prompt lear… ▽ More

    Submitted 30 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at ECCV24