Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Jang, Jongseong; Kyung, Daeun; Kim, Seung Hwan; Lee, Honglak; Bae, Kyunghoon; Choi, Edward

doi:10.1038/s41598-024-73695-z

Computer Science > Machine Learning

arXiv:2212.07050 (cs)

[Submitted on 14 Dec 2022 (v1), last revised 11 Oct 2024 (this version, v3)]

Title:Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Authors:Jongseong Jang, Daeun Kyung, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae, Edward Choi

View PDF

Abstract:Deep neural networks are increasingly used in medical imaging for tasks such as pathological classification, but they face challenges due to the scarcity of high-quality, expert-labeled training data. Recent efforts have utilized pre-trained contrastive image-text models like CLIP, adapting them for medical use by fine-tuning the model with chest X-ray images and corresponding reports for zero-shot pathology classification, thus eliminating the need for pathology-specific annotations. However, most studies continue to use the same contrastive learning objectives as in the general domain, overlooking the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. We aim to improve the performance of zero-shot pathology classification without relying on external knowledge. Our method can be applied to any pre-trained contrastive image-text encoder and easily transferred to out-of-domain datasets without further training, as it does not use external data. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models, with an average macro AUROC increase of 4.3%. Additionally, our method outperforms the state-of-the-art and marginally surpasses board-certified radiologists in zero-shot classification for the five competition pathologies in the CheXpert dataset.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Cite as:	arXiv:2212.07050 [cs.LG]
	(or arXiv:2212.07050v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2212.07050
Journal reference:	Sci Rep 14, 23199 (2024)
Related DOI:	https://doi.org/10.1038/s41598-024-73695-z

Submission history

From: Daeun Kyung [view email]
[v1] Wed, 14 Dec 2022 06:04:18 UTC (2,758 KB)
[v2] Fri, 17 Mar 2023 02:32:06 UTC (2,221 KB)
[v3] Fri, 11 Oct 2024 08:19:58 UTC (3,138 KB)

Computer Science > Machine Learning

Title:Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators