Label-free estimation of clinically relevant performance metrics under distribution shifts

Flühmann, Tim; Bissoto, Alceu; Hoang, Trung-Dung; Koch, Lisa M.

Computer Science > Machine Learning

arXiv:2507.22776 (cs)

[Submitted on 30 Jul 2025]

Title:Label-free estimation of clinically relevant performance metrics under distribution shifts

Authors:Tim Flühmann, Alceu Bissoto, Trung-Dung Hoang, Lisa M. Koch

View PDF HTML (experimental)

Abstract:Performance monitoring is essential for safe clinical deployment of image classification models. However, because ground-truth labels are typically unavailable in the target dataset, direct assessment of real-world model performance is infeasible. State-of-the-art performance estimation methods address this by leveraging confidence scores to estimate the target accuracy. Despite being a promising direction, the established methods mainly estimate the model's accuracy and are rarely evaluated in a clinical domain, where strong class imbalances and dataset shifts are common. Our contributions are twofold: First, we introduce generalisations of existing performance prediction methods that directly estimate the full confusion matrix. Then, we benchmark their performance on chest x-ray data in real-world distribution shifts as well as simulated covariate and prevalence shifts. The proposed confusion matrix estimation methods reliably predicted clinically relevant counting metrics on medical images under distribution shifts. However, our simulated shift scenarios exposed important failure modes of current performance estimation techniques, calling for a better understanding of real-world deployment contexts when implementing these performance monitoring techniques for postmarket surveillance of medical AI models.

Comments:	Accepted oral at UNSURE 2025 @ MICCAI
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2507.22776 [cs.LG]
	(or arXiv:2507.22776v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2507.22776

Submission history

From: Tim Gabriel Flühmann [view email]
[v1] Wed, 30 Jul 2025 15:37:58 UTC (358 KB)

Computer Science > Machine Learning

Title:Label-free estimation of clinically relevant performance metrics under distribution shifts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Label-free estimation of clinically relevant performance metrics under distribution shifts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators