Med-Flamingo: a Multimodal Medical Few-shot Learner

Moor, Michael; Huang, Qian; Wu, Shirley; Yasunaga, Michihiro; Zakka, Cyril; Dalmia, Yash; Reis, Eduardo Pontes; Rajpurkar, Pranav; Leskovec, Jure

Computer Science > Computer Vision and Pattern Recognition

arXiv:2307.15189 (cs)

[Submitted on 27 Jul 2023]

Title:Med-Flamingo: a Multimodal Medical Few-shot Learner

Authors:Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Cyril Zakka, Yash Dalmia, Eduardo Pontes Reis, Pranav Rajpurkar, Jure Leskovec

View PDF

Abstract:Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under this https URL.

Comments:	Preprint
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2307.15189 [cs.CV]
	(or arXiv:2307.15189v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2307.15189

Submission history

From: Michael Moor [view email]
[v1] Thu, 27 Jul 2023 20:36:02 UTC (3,182 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Med-Flamingo: a Multimodal Medical Few-shot Learner

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Med-Flamingo: a Multimodal Medical Few-shot Learner

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators