Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Moglia, Andrea; Leccardi, Matteo; Cavicchioli, Matteo; Maccarini, Alice; Marcon, Marco; Mainardi, Luca; Cerveri, Pietro

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2506.10825 (eess)

[Submitted on 12 Jun 2025]

Title:Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Authors:Andrea Moglia (1), Matteo Leccardi (1), Matteo Cavicchioli (1), Alice Maccarini (2), Marco Marcon (1), Luca Mainardi (1), Pietro Cerveri (1 and 2) ((1) Politecnico di Milano, (2) Università di Pavia)

View PDF

Abstract:Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

Comments:	132 pages, 26 figures, 23 tables. Andrea Moglia and Matteo Leccardi are equally contributing authors
Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	A.1; I.2.0; I.4.6
Cite as:	arXiv:2506.10825 [eess.IV]
	(or arXiv:2506.10825v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2506.10825

Submission history

From: Matteo Leccardi [view email]
[v1] Thu, 12 Jun 2025 15:44:49 UTC (6,618 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators