Leveraging Foundation Models for Content-Based Image Retrieval in Radiology

Denner, Stefan; Zimmerer, David; Bounias, Dimitrios; Bujotzek, Markus; Xiao, Shuhan; Stock, Raphael; Kausch, Lisa; Schader, Philipp; Penzkofer, Tobias; Jäger, Paul F.; Maier-Hein, Klaus

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.06567 (cs)

[Submitted on 11 Mar 2024 (v1), last revised 22 Jun 2025 (this version, v4)]

Title:Leveraging Foundation Models for Content-Based Image Retrieval in Radiology

Authors:Stefan Denner, David Zimmerer, Dimitrios Bounias, Markus Bujotzek, Shuhan Xiao, Raphael Stock, Lisa Kausch, Philipp Schader, Tobias Penzkofer, Paul F. Jäger, Klaus Maier-Hein

View PDF HTML (experimental)

Abstract:Content-based image retrieval (CBIR) has the potential to significantly improve diagnostic aid and medical research in radiology. However, current CBIR systems face limitations due to their specialization to certain pathologies, limiting their utility. On the other hand, several vision foundation models have been shown to produce general-purpose visual features. Therefore, in this work, we propose using vision foundation models as powerful and versatile off-the-shelf feature extractors for content-based image retrieval. Our contributions include: (1) benchmarking a diverse set of vision foundation models on an extensive dataset comprising 1.6 million 2D radiological images across four modalities and 161 pathologies; (2) identifying weakly-supervised models, particularly BiomedCLIP, as highly effective, achieving a achieving a P@1 of up to 0.594 (P@3: 0.590, P@5: 0.588, P@10: 0.583), comparable to specialized CBIR systems but without additional training; (3) conducting an in-depth analysis of the impact of index size on retrieval performance; (4) evaluating the quality of embedding spaces generated by different models; and (5) investigating specific challenges associated with retrieving anatomical versus pathological structures. Despite these challenges, our research underscores the vast potential of foundation models for CBIR in radiology, proposing a shift towards versatile, general-purpose medical image retrieval systems that do not require specific tuning. Our code, dataset splits and embeddings are publicly available under this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2403.06567 [cs.CV]
	(or arXiv:2403.06567v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.06567

Submission history

From: Stefan Denner [view email]
[v1] Mon, 11 Mar 2024 10:06:45 UTC (26,334 KB)
[v2] Fri, 12 Apr 2024 08:52:24 UTC (26,334 KB)
[v3] Wed, 17 Apr 2024 15:58:36 UTC (26,334 KB)
[v4] Sun, 22 Jun 2025 12:33:24 UTC (23,470 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Foundation Models for Content-Based Image Retrieval in Radiology

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Leveraging Foundation Models for Content-Based Image Retrieval in Radiology

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators