-
The impact of tissue detection on diagnostic artificial intelligence algorithms in digital pathology
Authors:
Sol Erika Boman,
Nita Mulliqi,
Anders Blilie,
Xiaoyi Ji,
Kelvin Szolnoky,
Einar Gudlaugsson,
Emiel A. M. Janssen,
Svein R. Kjosavik,
José Asenjo,
Marcello Gambacorta,
Paolo Libretti,
Marcin Braun,
Radzislaw Kordek,
Roman Łowicki,
Kristina Hotakainen,
Päivi Väre,
Bodil Ginnerup Pedersen,
Karina Dalsgaard Sørensen,
Benedicte Parm Ulhøi,
Lars Egevad,
Kimmo Kartasalo
Abstract:
Tissue detection is a crucial first step in most digital pathology applications. Details of the segmentation algorithm are rarely reported, and there is a lack of studies investigating the downstream effects of a poor segmentation algorithm. Disregarding tissue detection quality could create a bottleneck for downstream performance and jeopardize patient safety if diagnostically relevant parts of t…
▽ More
Tissue detection is a crucial first step in most digital pathology applications. Details of the segmentation algorithm are rarely reported, and there is a lack of studies investigating the downstream effects of a poor segmentation algorithm. Disregarding tissue detection quality could create a bottleneck for downstream performance and jeopardize patient safety if diagnostically relevant parts of the specimen are excluded from analysis in clinical applications.
This study aims to determine whether performance of downstream tasks is sensitive to the tissue detection method, and to compare performance of classical and AI-based tissue detection. To this end, we trained an AI model for Gleason grading of prostate cancer in whole slide images (WSIs) using two different tissue detection algorithms: thresholding (classical) and UNet++ (AI). A total of 33,823 WSIs scanned on five digital pathology scanners were used to train the tissue detection AI model. The downstream Gleason grading algorithm was trained and tested using 70,524 WSIs from 13 clinical sites scanned on 13 different scanners.
There was a decrease from 116 (0.43%) to 22 (0.08%) fully undetected tissue samples when switching from thresholding-based tissue detection to AI-based, suggesting an AI model may be more reliable than a classical model for avoiding total failures on slides with unusual appearance. On the slides where tissue could be detected by both algorithms, no significant difference in overall Gleason grading performance was observed. However, tissue detection dependent clinically significant variations in AI grading were observed in 3.5% of malignant slides, highlighting the importance of robust tissue detection for optimal clinical performance of diagnostic AI.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Causes of evolutionary divergence in prostate cancer
Authors:
Emre Esenturk,
Atef Sahli,
Valeriia Haberland,
Aleksandra Ziuboniewicz,
Christopher Wirth,
G. Steven Bova,
Robert G Bristow,
Mark N Brook,
Benedikt Brors,
Adam Butler,
Géraldine Cancel-Tassin,
Kevin CL Cheng,
Colin S Cooper,
Niall M Corcoran,
Olivier Cussenot,
Ros A Eeles,
Francesco Favero,
Clarissa Gerhauser,
Abraham Gihawi,
Etsehiwot G Girma,
Vincent J Gnanapragasam,
Andreas J Gruber,
Anis Hamid,
Vanessa M Hayes,
Housheng Hansen He
, et al. (30 additional authors not shown)
Abstract:
Cancer progression involves the sequential accumulation of genetic alterations that cumulatively shape the tumour phenotype. In prostate cancer, tumours can follow divergent evolutionary trajectories that lead to distinct subtypes, but the causes of this divergence remain unclear. While causal inference could elucidate the factors involved, conventional methods are unsuitable due to the possibilit…
▽ More
Cancer progression involves the sequential accumulation of genetic alterations that cumulatively shape the tumour phenotype. In prostate cancer, tumours can follow divergent evolutionary trajectories that lead to distinct subtypes, but the causes of this divergence remain unclear. While causal inference could elucidate the factors involved, conventional methods are unsuitable due to the possibility of unobserved confounders and ambiguity in the direction of causality. Here, we propose a method that circumvents these issues and apply it to genomic data from 829 prostate cancer patients. We identify several genetic alterations that drive divergence as well as others that prevent this transition, locking tumours into one trajectory. Further analysis reveals that these genetic alterations may cause each other, implying a positive-feedback loop that accelerates divergence. Our findings provide insights into how cancer subtypes emerge and offer a foundation for genomic surveillance strategies aimed at monitoring the progression of prostate cancer.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Foundation Models -- A Panacea for Artificial Intelligence in Pathology?
Authors:
Nita Mulliqi,
Anders Blilie,
Xiaoyi Ji,
Kelvin Szolnoky,
Henrik Olsson,
Sol Erika Boman,
Matteo Titus,
Geraldine Martinez Gonzalez,
Julia Anna Mielcarz,
Masi Valkonen,
Einar Gudlaugsson,
Svein R. Kjosavik,
José Asenjo,
Marcello Gambacorta,
Paolo Libretti,
Marcin Braun,
Radzislaw Kordek,
Roman Łowicki,
Kristina Hotakainen,
Päivi Väre,
Bodil Ginnerup Pedersen,
Karina Dalsgaard Sørensen,
Benedicte Parm Ulhøi,
Pekka Ruusuvuori,
Brett Delahunt
, et al. (6 additional authors not shown)
Abstract:
The role of artificial intelligence (AI) in pathology has evolved from aiding diagnostics to uncovering predictive morphological patterns in whole slide images (WSIs). Recently, foundation models (FMs) leveraging self-supervised pre-training have been widely advocated as a universal solution for diverse downstream tasks. However, open questions remain about their clinical applicability and general…
▽ More
The role of artificial intelligence (AI) in pathology has evolved from aiding diagnostics to uncovering predictive morphological patterns in whole slide images (WSIs). Recently, foundation models (FMs) leveraging self-supervised pre-training have been widely advocated as a universal solution for diverse downstream tasks. However, open questions remain about their clinical applicability and generalization advantages over end-to-end learning using task-specific (TS) models. Here, we focused on AI with clinical-grade performance for prostate cancer diagnosis and Gleason grading. We present the largest validation of AI for this task, using over 100,000 core needle biopsies from 7,342 patients across 15 sites in 11 countries. We compared two FMs with a fully end-to-end TS model in a multiple instance learning framework. Our findings challenge assumptions that FMs universally outperform TS models. While FMs demonstrated utility in data-scarce scenarios, their performance converged with - and was in some cases surpassed by - TS models when sufficient labeled training data were available. Notably, extensive task-specific training markedly reduced clinically significant misgrading, misdiagnosis of challenging morphologies, and variability across different WSI scanners. Additionally, FMs used up to 35 times more energy than the TS model, raising concerns about their sustainability. Our results underscore that while FMs offer clear advantages for rapid prototyping and research, their role as a universal solution for clinically applicable medical AI remains uncertain. For high-stakes clinical applications, rigorous validation and consideration of task-specific training remain critically important. We advocate for integrating the strengths of FMs and end-to-end learning to achieve robust and resource-efficient AI pathology solutions fit for clinical use.
△ Less
Submitted 3 March, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence Assisted Cancer Diagnosis
Authors:
Xiaoyi Ji,
Richard Salmon,
Nita Mulliqi,
Umair Khan,
Yinxi Wang,
Anders Blilie,
Henrik Olsson,
Bodil Ginnerup Pedersen,
Karina Dalsgaard Sørensen,
Benedicte Parm Ulhøi,
Svein R Kjosavik,
Emilius AM Janssen,
Mattias Rantalainen,
Lars Egevad,
Pekka Ruusuvuori,
Martin Eklund,
Kimmo Kartasalo
Abstract:
The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs), leading to degraded AI performance and posing a challenge for widespread clinical application as fine-tuning algorithms for each new site is impractical. Changes in the imaging workflow can also lead to compromised diagnoses and patient safety r…
▽ More
The potential of artificial intelligence (AI) in digital pathology is limited by technical inconsistencies in the production of whole slide images (WSIs), leading to degraded AI performance and posing a challenge for widespread clinical application as fine-tuning algorithms for each new site is impractical. Changes in the imaging workflow can also lead to compromised diagnoses and patient safety risks. We evaluated whether physical color calibration of scanners can standardize WSI appearance and enable robust AI performance. We employed a color calibration slide in four different laboratories and evaluated its impact on the performance of an AI system for prostate cancer diagnosis on 1,161 WSIs. Color standardization resulted in consistently improved AI model calibration and significant improvements in Gleason grading performance. The study demonstrates that physical color calibration provides a potential solution to the variation introduced by different scanners, making AI-based cancer diagnostics more reliable and applicable in clinical settings.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.