-
Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer
Authors:
Gesa Mittmann,
Sara Laiouar-Pedari,
Hendrik A. Mehrtens,
Sarah Haggenmüller,
Tabea-Clara Bucher,
Tirtha Chanda,
Nadine T. Gaisa,
Mathias Wagner,
Gilbert Georg Klamminger,
Tilman T. Rau,
Christina Neppl,
Eva Maria Compérat,
Andreas Gocht,
Monika Hämmerle,
Niels J. Rupp,
Jula Westhoff,
Irene Krücken,
Maximillian Seidl,
Christian M. Schürch,
Marcus Bauer,
Wiebke Solass,
Yu Chun Tam,
Florian Weber,
Rainer Grobholz,
Jaroslaw Augustyniak
, et al. (41 additional authors not shown)
Abstract:
The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue…
▽ More
The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Advancing dermatological diagnosis: Development of a hyperspectral dermatoscope for enhanced skin imaging
Authors:
Martin J. Hetz,
Carina Nogueira Garcia,
Sarah Haggenmüller,
Titus J. Brinker
Abstract:
Clinical dermatology necessitates precision and innovation for efficient diagnosis and treatment of various skin conditions. This paper introduces the development of a cutting-edge hyperspectral dermatoscope (the Hyperscope) tailored for human skin analysis. We detail the requirements to such a device and the design considerations, from optical configurations to sensor selection, necessary to capt…
▽ More
Clinical dermatology necessitates precision and innovation for efficient diagnosis and treatment of various skin conditions. This paper introduces the development of a cutting-edge hyperspectral dermatoscope (the Hyperscope) tailored for human skin analysis. We detail the requirements to such a device and the design considerations, from optical configurations to sensor selection, necessary to capture a wide spectral range with high fidelity. Preliminary results from 15 individuals and 160 recorded skin images demonstrate the potential of the Hyperscope in identifying and characterizing various skin conditions, offering a promising avenue for non-invasive skin evaluation and a platform for future research in dermatology-related hyperspectral imaging.
△ Less
Submitted 25 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Clinical Melanoma Diagnosis with Artificial Intelligence: Insights from a Prospective Multicenter Study
Authors:
Lukas Heinlein,
Roman C. Maron,
Achim Hekler,
Sarah Haggenmüller,
Christoph Wies,
Jochen S. Utikal,
Friedegund Meier,
Sarah Hobelsberger,
Frank F. Gellrich,
Mildred Sergon,
Axel Hauschild,
Lars E. French,
Lucie Heinzerling,
Justin G. Schlager,
Kamran Ghoreschi,
Max Schlaak,
Franz J. Hilke,
Gabriela Poch,
Sören Korsing,
Carola Berking,
Markus V. Heppt,
Michael Erdmann,
Sebastian Haferkamp,
Konstantin Drexler,
Dirk Schadendorf
, et al. (5 additional authors not shown)
Abstract:
Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or…
▽ More
Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting. Therefore, we assessed 'All Data are Ext' (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e. providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities. Overall, the AI showed higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779-0.814 vs. 0.781, 95% CI 0.760-0.802; p<0.001), obtaining a higher sensitivity (0.921, 95% CI 0.900- 0.942 vs. 0.734, 95% CI 0.701-0.770; p<0.001) at the cost of a lower specificity (0.673, 95% CI 0.641-0.702 vs. 0.828, 95% CI 0.804-0.852; p<0.001). As the algorithm exhibited a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists particularly in diagnosing challenging cases.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Evaluating Deep Learning-based Melanoma Classification using Immunohistochemistry and Routine Histology: A Three Center Study
Authors:
Christoph Wies,
Lucas Schneider,
Sarah Haggenmueller,
Tabea-Clara Bucher,
Sarah Hobelsberger,
Markus V. Heppt,
Gerardo Ferrara,
Eva I. Krieghoff-Henning,
Titus J. Brinker
Abstract:
Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In co…
▽ More
Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In contrast, there are few studies that analyze IHC slides using DL. Therefore, we investigated the separate and joint performance of ResNets trained on MelanA and corresponding H&E-stained slides. The MelanA classifier achieved an area under receiver operating characteristics curve (AUROC) of 0.82 and 0.74 on out of distribution (OOD)-datasets, similar to the H&E-based benchmark classification of 0.81 and 0.75, respectively. A combined classifier using MelanA and H&E achieved AUROCs of 0.85 and 0.81 on the OOD datasets. DL MelanA-based assistance systems show the same performance as the benchmark H&E classification and may be improved by multi stain classification to assist pathologists in their clinical routine.
△ Less
Submitted 8 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Using Multiple Dermoscopic Photographs of One Lesion Improves Melanoma Classification via Deep Learning: A Prognostic Diagnostic Accuracy Study
Authors:
Achim Hekler,
Roman C. Maron,
Sarah Haggenmüller,
Max Schmitt,
Christoph Wies,
Jochen S. Utikal,
Friedegund Meier,
Sarah Hobelsberger,
Frank F. Gellrich,
Mildred Sergon,
Axel Hauschild,
Lars E. French,
Lucie Heinzerling,
Justin G. Schlager,
Kamran Ghoreschi,
Max Schlaak,
Franz J. Hilke,
Gabriela Poch,
Sören Korsing,
Carola Berking,
Markus V. Heppt,
Michael Erdmann,
Sebastian Haferkamp,
Konstantin Drexler,
Dirk Schadendorf
, et al. (6 additional authors not shown)
Abstract:
Background: Convolutional neural network (CNN)-based melanoma classifiers face several challenges that limit their usefulness in clinical practice. Objective: To investigate the impact of multiple real-world dermoscopic views of a single lesion of interest on a CNN-based melanoma classifier.
Methods: This study evaluated 656 suspected melanoma lesions. Classifier performance was measured using a…
▽ More
Background: Convolutional neural network (CNN)-based melanoma classifiers face several challenges that limit their usefulness in clinical practice. Objective: To investigate the impact of multiple real-world dermoscopic views of a single lesion of interest on a CNN-based melanoma classifier.
Methods: This study evaluated 656 suspected melanoma lesions. Classifier performance was measured using area under the receiver operating characteristic curve (AUROC), expected calibration error (ECE) and maximum confidence change (MCC) for (I) a single-view scenario, (II) a multiview scenario using multiple artificially modified images per lesion and (III) a multiview scenario with multiple real-world images per lesion.
Results: The multiview approach with real-world images significantly increased the AUROC from 0.905 (95% CI, 0.879-0.929) in the single-view approach to 0.930 (95% CI, 0.909-0.951). ECE and MCC also improved significantly from 0.131 (95% CI, 0.105-0.159) to 0.072 (95% CI: 0.052-0.093) and from 0.149 (95% CI, 0.125-0.171) to 0.115 (95% CI: 0.099-0.131), respectively. Comparing multiview real-world to artificially modified images showed comparable diagnostic accuracy and uncertainty estimation, but significantly worse robustness for the latter.
Conclusion: Using multiple real-world images is an inexpensive method to positively impact the performance of a CNN-based melanoma classifier.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.