Search | arXiv e-print repository

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Authors: Meritxell Riera-Marin, Sikha O K, Julia Rodriguez-Comas, Matthias Stefan May, Zhaohong Pan, Xiang Zhou, Xiaokun Liang, Franciskus Xaverius Erick, Andrea Prenner, Cedric Hemon, Valentin Boussot, Jean-Louis Dillenseger, Jean-Claude Nunes, Abdul Qayyum, Moona Mazher, Steven A Niederer, Kaisar Kushibar, Carlos Martin-Isla, Petia Radeva, Karim Lekadir, Theodore Barfoot, Luis C. Garcia Peraza Herrera, Ben Glocker, Tom Vercauteren, Lucas Gago , et al. (7 additional authors not shown)

Abstract: Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highl… ▽ More Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highlights the critical role of multiple annotators in establishing a more comprehensive ground truth, emphasizing that segmentation is inherently subjective and that leveraging inter-annotator variability is essential for robust model evaluation. Seven teams participated in the challenge, submitting a variety of DL models evaluated using metrics such as Dice Similarity Coefficient (DSC), Expected Calibration Error (ECE), and Continuous Ranked Probability Score (CRPS). By incorporating consensus and dissensus ground truth, we assess how DL models handle uncertainty and whether their confidence estimates align with true segmentation performance. Our findings reinforce the importance of well-calibrated models, as better calibration is strongly correlated with the quality of the results. Furthermore, we demonstrate that segmentation models trained on diverse datasets and enriched with pre-trained knowledge exhibit greater robustness, particularly in cases deviating from standard anatomical structures. Notably, the best-performing models achieved high DSC and well-calibrated uncertainty estimates. This work underscores the need for multi-annotator ground truth, thorough calibration assessments, and uncertainty-aware evaluations to develop trustworthy and clinically reliable DL-based medical image segmentation models. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: This challenge was hosted in MICCAI 2024

arXiv:2503.02549 [pdf, other]

Federated nnU-Net for Privacy-Preserving Medical Image Segmentation

Authors: Grzegorz Skorupko, Fotios Avgoustidis, Carlos Martín-Isla, Lidia Garrucho, Dimitri A. Kessler, Esmeralda Ruiz Pujadas, Oliver Díaz, Maciej Bobowicz, Katarzyna Gwoździewicz, Xavier Bargalló, Paulius Jaruševičius, Kaisar Kushibar, Karim Lekadir

Abstract: The nnU-Net framework has played a crucial role in medical image segmentation and has become the gold standard in multitudes of applications targeting different diseases, organs, and modalities. However, so far it has been used primarily in a centralized approach where the data collected from hospitals are stored in one center and used to train the nnU-Net. This centralized approach has various li… ▽ More The nnU-Net framework has played a crucial role in medical image segmentation and has become the gold standard in multitudes of applications targeting different diseases, organs, and modalities. However, so far it has been used primarily in a centralized approach where the data collected from hospitals are stored in one center and used to train the nnU-Net. This centralized approach has various limitations, such as leakage of sensitive patient information and violation of patient privacy. Federated learning is one of the approaches to train a segmentation model in a decentralized manner that helps preserve patient privacy. In this paper, we propose FednnU-Net, a federated learning extension of nnU-Net. We introduce two novel federated learning methods to the nnU-Net framework - Federated Fingerprint Extraction (FFE) and Asymmetric Federated Averaging (AsymFedAvg) - and experimentally show their consistent performance for breast, cardiac and fetal segmentation using 6 datasets representing samples from 18 institutions. Additionally, to further promote research and deployment of decentralized training in privacy constrained institutions, we make our plug-n-play framework public. The source-code is available at https://github.com/faildeny/FednnUNet . △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: In review

arXiv:2412.16085 [pdf, other]

Efficient MedSAMs: Segment Anything in Medical Images on Laptop

Authors: Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu, Songxiao Yang, Lennart Purucker, Zdravko Marinov, Marius Staring, Haisheng Lu, Thuy Thanh Dao, Xincheng Ye, Zhi Li, Gianluca Brugnara, Philipp Vollmuth, Martha Foltyn-Dumitru, Jaeyoung Cho, Mustafa Ahmed Mahmutoglu, Martin Bendszus, Irada Pflüger , et al. (57 additional authors not shown)

Abstract: Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa… ▽ More Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

arXiv:2412.01496 [pdf, ps, other]

Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging Datasets

Authors: Nicholas Konz, Richard Osuala, Preeti Verma, Yuwen Chen, Hanxue Gu, Haoyu Dong, Yaqian Chen, Andrew Marshall, Lidia Garrucho, Kaisar Kushibar, Daniel M. Lang, Gene S. Kim, Lars J. Grimm, John M. Lewin, James S. Duncan, Julia A. Schnabel, Oliver Diaz, Karim Lekadir, Maciej A. Mazurowski

Abstract: Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perc… ▽ More Determining whether two sets of images belong to the same or different distributions or domains is a crucial task in modern medical image analysis and deep learning; for example, to evaluate the output quality of image generative models. Currently, metrics used for this task either rely on the (potentially biased) choice of some downstream task, such as segmentation, or adopt task-independent perceptual metrics (e.g., Fréchet Inception Distance/FID) from natural imaging, which we show insufficiently capture anatomical features. To this end, we introduce a new perceptual metric tailored for medical images, FRD (Fréchet Radiomic Distance), which utilizes standardized, clinically meaningful, and interpretable image features. We show that FRD is superior to other image distribution metrics for a range of medical imaging applications, including out-of-domain (OOD) detection, the evaluation of image-to-image translation (by correlating more with downstream task performance as well as anatomical consistency and realism), and the evaluation of unconditional image generation. Moreover, FRD offers additional benefits such as stability and computational efficiency at low sample sizes, sensitivity to image corruptions and adversarial attacks, feature interpretability, and correlation with radiologist-perceived image quality. Additionally, we address key gaps in the literature by presenting an extensive framework for the multifaceted evaluation of image similarity metrics in medical imaging -- including the first large-scale comparative study of generative models for medical image translation -- and release an accessible codebase to facilitate future research. Our results are supported by thorough experiments spanning a variety of datasets, modalities, and downstream tasks, highlighting the broad potential of FRD for medical image analysis. △ Less

Submitted 6 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: Codebase for FRD computation: https://github.com/RichardObi/frd-score. Codebase for medical image similarity metric evaluation framework: https://github.com/mazurowski-lab/medical-image-similarity-metrics

arXiv:2406.13844 [pdf, other]

doi 10.1038/s41597-025-04707-4

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations

Authors: Lidia Garrucho, Kaisar Kushibar, Claire-Anne Reidel, Smriti Joshi, Richard Osuala, Apostolia Tsirikoglou, Maciej Bobowicz, Javier del Riego, Alessandro Catanese, Katarzyna Gwoździewicz, Maria-Laura Cosaka, Pasant M. Abo-Elhoda, Sara W. Tantawy, Shorouq S. Sakrana, Norhan O. Shawky-Abdelfatah, Amr Muhammad Abdo-Salem, Androniki Kozana, Eugen Divjak, Gordana Ivanac, Katerina Nikiforaki, Michail E. Klontzas, Rosa García-Dosdá, Meltem Gulsun-Akpinar, Oğuz Lafcı, Ritse Mann , et al. (8 additional authors not shown)

Abstract: Artificial Intelligence (AI) research in breast cancer Magnetic Resonance Imaging (MRI) faces challenges due to limited expert-labeled segmentations. To address this, we present a multicenter dataset of 1506 pre-treatment T1-weighted dynamic contrast-enhanced MRI cases, including expert annotations of primary tumors and non-mass-enhanced regions. The dataset integrates imaging data from four colle… ▽ More Artificial Intelligence (AI) research in breast cancer Magnetic Resonance Imaging (MRI) faces challenges due to limited expert-labeled segmentations. To address this, we present a multicenter dataset of 1506 pre-treatment T1-weighted dynamic contrast-enhanced MRI cases, including expert annotations of primary tumors and non-mass-enhanced regions. The dataset integrates imaging data from four collections in The Cancer Imaging Archive (TCIA), where only 163 cases with expert segmentations were initially available. To facilitate the annotation process, a deep learning model was trained to produce preliminary segmentations for the remaining cases. These were subsequently corrected and verified by 16 breast cancer experts (averaging 9 years of experience), creating a fully annotated dataset. Additionally, the dataset includes 49 harmonized clinical and demographic variables, as well as pre-trained weights for a baseline nnU-Net model trained on the annotated data. This resource addresses a critical gap in publicly available breast cancer datasets, enabling the development, validation, and benchmarking of advanced deep learning models, thus driving progress in breast cancer diagnostics, treatment response prediction, and personalized care. △ Less

Submitted 21 February, 2025; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: 15 paes, 7 figures, 3 tables

Journal ref: Sci Data 12, 453 (2025)

arXiv:2403.19508 [pdf, other]

Debiasing Cardiac Imaging with Controlled Latent Diffusion Models

Authors: Grzegorz Skorupko, Richard Osuala, Zuzanna Szafranowska, Kaisar Kushibar, Nay Aung, Steffen E Petersen, Karim Lekadir, Polyxeni Gkontra

Abstract: The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We ad… ▽ More The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry derived from segmentation masks using a large-cohort study, specifically, the UK Biobank. We assess our method by evaluating the realism of the generated images using established quantitative metrics. Furthermore, we conduct a downstream classification task aimed at debiasing a classifier by rectifying imbalances within underrepresented groups through synthetically generated samples. Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances, such as the scarcity of younger patients or individuals with normal BMI level suffering from heart failure. This work represents a major step towards the adoption of synthetic data for the development of fair and generalizable models for medical classification tasks. Notably, we conduct all our experiments using a single, consumer-level GPU to highlight the feasibility of our approach within resource-constrained environments. Our code is available at https://github.com/faildeny/debiasing-cardiac-mri. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.13890 [pdf, other]

Towards Learning Contrast Kinetics with Multi-Condition Latent Diffusion Models

Authors: Richard Osuala, Daniel M. Lang, Preeti Verma, Smriti Joshi, Apostolia Tsirikoglou, Grzegorz Skorupko, Kaisar Kushibar, Lidia Garrucho, Walter H. L. Pinaya, Oliver Diaz, Julia A. Schnabel, Karim Lekadir

Abstract: Contrast agents in dynamic contrast enhanced magnetic resonance imaging allow to localize tumors and observe their contrast kinetics, which is essential for cancer characterization and respective treatment decision-making. However, contrast agent administration is not only associated with adverse health risks, but also restricted for patients during pregnancy, and for those with kidney malfunction… ▽ More Contrast agents in dynamic contrast enhanced magnetic resonance imaging allow to localize tumors and observe their contrast kinetics, which is essential for cancer characterization and respective treatment decision-making. However, contrast agent administration is not only associated with adverse health risks, but also restricted for patients during pregnancy, and for those with kidney malfunction, or other adverse reactions. With contrast uptake as key biomarker for lesion malignancy, cancer recurrence risk, and treatment response, it becomes pivotal to reduce the dependency on intravenous contrast agent administration. To this end, we propose a multi-conditional latent diffusion model capable of acquisition time-conditioned image synthesis of DCE-MRI temporal sequences. To evaluate medical image synthesis, we additionally propose and validate the Fréchet radiomics distance as an image quality measure based on biomarker variability between synthetic and real imaging data. Our results demonstrate our method's ability to generate realistic multi-sequence fat-saturated breast DCE-MRI and uncover the emerging potential of deep learning based contrast kinetics simulation. We publicly share our accessible codebase at https://github.com/RichardObi/ccnet and provide a user-friendly library for Fréchet radiomics distance calculation at https://pypi.org/project/frd-score. △ Less

Submitted 17 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Early Accept at MICCAI2024

arXiv:2309.12325 [pdf]

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Authors: Karim Lekadir, Aasa Feragen, Abdul Joseph Fofanah, Alejandro F Frangi, Alena Buyx, Anais Emelie, Andrea Lara, Antonio R Porras, An-Wen Chan, Arcadi Navarro, Ben Glocker, Benard O Botwe, Bishesh Khanal, Brigit Beger, Carol C Wu, Celia Cintas, Curtis P Langlotz, Daniel Rueckert, Deogratias Mzurikwao, Dimitrios I Fotiadis, Doszhan Zhussupov, Enzo Ferrante, Erik Meijering, Eva Weicken, Fabio A González , et al. (95 additional authors not shown)

Abstract: Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted… ▽ More Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI. △ Less

Submitted 8 July, 2024; v1 submitted 11 August, 2023; originally announced September 2023.

ACM Class: I.2.0; I.4.0; I.5.0

arXiv:2308.09640 [pdf, other]

Revisiting Skin Tone Fairness in Dermatological Lesion Classification

Authors: Thorsten Kalb, Kaisar Kushibar, Celia Cintas, Karim Lekadir, Oliver Diaz, Richard Osuala

Abstract: Addressing fairness in lesion classification from dermatological images is crucial due to variations in how skin diseases manifest across skin tones. However, the absence of skin tone labels in public datasets hinders building a fair classifier. To date, such skin tone labels have been estimated prior to fairness analysis in independent studies using the Individual Typology Angle (ITA). Briefly, I… ▽ More Addressing fairness in lesion classification from dermatological images is crucial due to variations in how skin diseases manifest across skin tones. However, the absence of skin tone labels in public datasets hinders building a fair classifier. To date, such skin tone labels have been estimated prior to fairness analysis in independent studies using the Individual Typology Angle (ITA). Briefly, ITA calculates an angle based on pixels extracted from skin images taking into account the lightness and yellow-blue tints. These angles are then categorised into skin tones that are subsequently used to analyse fairness in skin cancer classification. In this work, we review and compare four ITA-based approaches of skin tone classification on the ISIC18 dataset, a common benchmark for assessing skin cancer classification fairness in the literature. Our analyses reveal a high disagreement among previously published studies demonstrating the risks of ITA-based skin tone estimation methods. Moreover, we investigate the causes of such large discrepancy among these approaches and find that the lack of diversity in the ISIC18 dataset limits its use as a testbed for fairness analysis. Finally, we recommend further research on robust ITA estimation and diverse dataset acquisition with skin tone annotation to facilitate conclusive fairness assessments of artificial intelligence tools in dermatology. Our code is available at https://github.com/tkalbl/RevisitingSkinToneFairness. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: Accepted at 2023 MICCAI FAIMI Workshop

arXiv:2209.14472 [pdf, other]

doi 10.1117/1.JMI.10.6.061403

medigan: a Python library of pretrained generative models for medical image synthesis

Authors: Richard Osuala, Grzegorz Skorupko, Noussair Lazrak, Lidia Garrucho, Eloy García, Smriti Joshi, Socayna Jouide, Michael Rutherford, Fred Prior, Kaisar Kushibar, Oliver Diaz, Karim Lekadir

Abstract: Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for… ▽ More Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. medigan allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. Guided by design decisions based on gathered end-user requirements, we implement medigan based on modular components for generative model (i) execution, (ii) visualisation, (iii) search & ranking, and (iv) contribution. The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models consisting of 21 models utilising 9 different Generative Adversarial Network architectures trained on 11 datasets from 4 domains, namely, mammography, endoscopy, x-ray, and MRI. Furthermore, 3 applications of medigan are analysed in this work, which include (a) enabling community-wide sharing of restricted data, (b) investigating generative model evaluation metrics, and (c) improving clinical downstream tasks. In (b), extending on common medical image synthesis assessment and reporting standards, we show Fréchet Inception Distance variability based on image normalisation and radiology-specific feature extraction. △ Less

Submitted 23 February, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: 32 pages, 7 figures

ACM Class: I.4.0; I.2.0; I.5.1

Journal ref: Journal of Medical Imaging 10.6 (2023) 061403

arXiv:2209.09809 [pdf]

doi 10.3389/fonc.2022.1044496

High-resolution synthesis of high-density breast mammograms: Application to improved fairness in deep learning based mass detection

Authors: Lidia Garrucho, Kaisar Kushibar, Richard Osuala, Oliver Diaz, Alessandro Catanese, Javier del Riego, Maciej Bobowicz, Fredrik Strand, Laura Igual, Karim Lekadir

Abstract: Computer-aided detection systems based on deep learning have shown good performance in breast cancer detection. However, high-density breasts show poorer detection performance since dense tissues can mask or even simulate masses. Therefore, the sensitivity of mammography for breast cancer detection can be reduced by more than 20% in dense breasts. Additionally, extremely dense cases reported an in… ▽ More Computer-aided detection systems based on deep learning have shown good performance in breast cancer detection. However, high-density breasts show poorer detection performance since dense tissues can mask or even simulate masses. Therefore, the sensitivity of mammography for breast cancer detection can be reduced by more than 20% in dense breasts. Additionally, extremely dense cases reported an increased risk of cancer compared to low-density breasts. This study aims to improve the mass detection performance in highdensity breasts using synthetic high-density full-field digital mammograms (FFDM) as data augmentation during breast mass detection model training. To this end, a total of five cycle-consistent GAN (CycleGAN) models using three FFDM datasets were trained for low-to-high-density image translation in highresolution mammograms. The training images were split by breast density BIRADS categories, being BI-RADS A almost entirely fatty and BI-RADS D extremely dense breasts. Our results showed that the proposed data augmentation technique improved the sensitivity and precision of mass detection in models trained with small datasets and improved the domain generalization of the models trained with large databases. In addition, the clinical realism of the synthetic images was evaluated in a reader study involving two expert radiologists and one surgical oncologist. △ Less

Submitted 24 January, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: 9 figures, 4 tables

arXiv:2203.08878 [pdf, other]

Layer Ensembles: A Single-Pass Uncertainty Estimation in Deep Learning for Segmentation

Authors: Kaisar Kushibar, Víctor Manuel Campello, Lidia Garrucho Moras, Akis Linardos, Petia Radeva, Karim Lekadir

Abstract: Uncertainty estimation in deep learning has become a leading research field in medical image analysis due to the need for safe utilisation of AI algorithms in clinical practice. Most approaches for uncertainty estimation require sampling the network weights multiple times during testing or training multiple networks. This leads to higher training and testing costs in terms of time and computationa… ▽ More Uncertainty estimation in deep learning has become a leading research field in medical image analysis due to the need for safe utilisation of AI algorithms in clinical practice. Most approaches for uncertainty estimation require sampling the network weights multiple times during testing or training multiple networks. This leads to higher training and testing costs in terms of time and computational resources. In this paper, we propose Layer Ensembles, a novel uncertainty estimation method that uses a single network and requires only a single pass to estimate predictive uncertainty of a network. Moreover, we introduce an image-level uncertainty metric, which is more beneficial for segmentation tasks compared to the commonly used pixel-wise metrics such as entropy and variance. We evaluate our approach on 2D and 3D, binary and multi-class medical image segmentation tasks. Our method shows competitive results with state-of-the-art Deep Ensembles, requiring only a single network and a single pass. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.04961 [pdf, other]

doi 10.1117/12.2625781

Sharing Generative Models Instead of Private Data: A Simulation Study on Mammography Patch Classification

Authors: Zuzanna Szafranowska, Richard Osuala, Bennet Breier, Kaisar Kushibar, Karim Lekadir, Oliver Diaz

Abstract: Early detection of breast cancer in mammography screening via deep-learning based computer-aided detection systems shows promising potential in improving the curability and mortality rates of breast cancer. However, many clinical centres are restricted in the amount and heterogeneity of available data to train such models to (i) achieve promising performance and to (ii) generalise well across acqu… ▽ More Early detection of breast cancer in mammography screening via deep-learning based computer-aided detection systems shows promising potential in improving the curability and mortality rates of breast cancer. However, many clinical centres are restricted in the amount and heterogeneity of available data to train such models to (i) achieve promising performance and to (ii) generalise well across acquisition protocols and domains. As sharing data between centres is restricted due to patient privacy concerns, we propose a potential solution: sharing trained generative models between centres as substitute for real patient data. In this work, we use three well known mammography datasets to simulate three different centres, where one centre receives the trained generator of Generative Adversarial Networks (GANs) from the two remaining centres in order to augment the size and heterogeneity of its training dataset. We evaluate the utility of this approach on mammography patch classification on the test set of the GAN-receiving centre using two different classification models, (a) a convolutional neural network and (b) a transformer neural network. Our experiments demonstrate that shared GANs notably increase the performance of both transformer and convolutional classification models and highlight this approach as a viable alternative to inter-centre data sharing. △ Less

Submitted 15 April, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: Draft accepted as oral presentation at International Workshop on Breast Imaging (IWBI) 2022. 9 pages, 3 figures

ACM Class: I.2.0; I.4.0; I.5.0; J.3

Journal ref: 16th International Workshop on Breast Imaging (IWBI2022). 12286. 2022. 169 -- 177

arXiv:2201.11620 [pdf]

doi 10.1016/j.artmed.2022.102386

Domain generalization in deep learning-based mass detection in mammography: A large-scale multi-center study

Authors: Lidia Garrucho, Kaisar Kushibar, Socayna Jouide, Oliver Diaz, Laura Igual, Karim Lekadir

Abstract: Computer-aided detection systems based on deep learning have shown great potential in breast cancer detection. However, the lack of domain generalization of artificial neural networks is an important obstacle to their deployment in changing clinical environments. In this work, we explore the domain generalization of deep learning methods for mass detection in digital mammography and analyze in-dep… ▽ More Computer-aided detection systems based on deep learning have shown great potential in breast cancer detection. However, the lack of domain generalization of artificial neural networks is an important obstacle to their deployment in changing clinical environments. In this work, we explore the domain generalization of deep learning methods for mass detection in digital mammography and analyze in-depth the sources of domain shift in a large-scale multi-center setting. To this end, we compare the performance of eight state-of-the-art detection methods, including Transformer-based models, trained in a single domain and tested in five unseen domains. Moreover, a single-source mass detection training pipeline is designed to improve the domain generalization without requiring images from the new domain. The results show that our workflow generalizes better than state-of-the-art transfer learning-based approaches in four out of five domains while reducing the domain shift caused by the different acquisition protocols and scanner manufacturers. Subsequently, an extensive analysis is performed to identify the covariate shifts with bigger effects on the detection performance, such as due to differences in patient age, breast density, mass size, and mass malignancy. Ultimately, this comprehensive study provides key insights and best practices for future research on domain generalization in deep learning-based breast cancer detection. △ Less

Submitted 24 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

MSC Class: 68T07; 68U10; 65D17

arXiv:2109.09658 [pdf, other]

FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging

Authors: Karim Lekadir, Richard Osuala, Catherine Gallin, Noussair Lazrak, Kaisar Kushibar, Gianna Tsakou, Susanna Aussó, Leonor Cerdá Alberich, Kostas Marias, Manolis Tsiknakis, Sara Colantonio, Nickolas Papanikolaou, Zohaib Salahuddin, Henry C Woodruff, Philippe Lambin, Luis Martí-Bonmatí

Abstract: The recent advancements in artificial intelligence (AI) combined with the extensive amount of data generated by today's clinical systems, has led to the development of imaging AI solutions across the whole value chain of medical imaging, including image reconstruction, medical image segmentation, image-based diagnosis and treatment planning. Notwithstanding the successes and future potential of AI… ▽ More The recent advancements in artificial intelligence (AI) combined with the extensive amount of data generated by today's clinical systems, has led to the development of imaging AI solutions across the whole value chain of medical imaging, including image reconstruction, medical image segmentation, image-based diagnosis and treatment planning. Notwithstanding the successes and future potential of AI in medical imaging, many stakeholders are concerned of the potential risks and ethical implications of imaging AI solutions, which are perceived as complex, opaque, and difficult to comprehend, utilise, and trust in critical clinical applications. Addressing these concerns and risks, the FUTURE-AI framework has been proposed, which, sourced from a global multi-domain expert consensus, comprises guiding principles for increased trust, safety, and adoption for AI in healthcare. In this paper, we transform the general FUTURE-AI healthcare principles to a concise and specific AI implementation guide tailored to the needs of the medical imaging community. To this end, we carefully assess each building block of the FUTURE-AI framework consisting of (i) Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness and (vi) Explainability, and respectively define concrete best practices based on accumulated AI implementation experiences from five large European projects on AI in Health Imaging. We accompany our concrete step-by-step medical imaging development guide with a practical AI solution maturity checklist, thus enabling AI development teams to design, evaluate, maintain, and deploy technically, clinically and ethically trustworthy imaging AI solutions into clinical practice. △ Less

Submitted 22 July, 2024; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Please refer to arXiv:2309.12325 for the latest FUTURE-AI framework for healthcare

arXiv:2107.09543 [pdf, other]

doi 10.1016/j.media.2022.102704

Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging

Authors: Richard Osuala, Kaisar Kushibar, Lidia Garrucho, Akis Linardos, Zuzanna Szafranowska, Stefan Klein, Ben Glocker, Oliver Diaz, Karim Lekadir

Abstract: Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in Generative Adversarial Networks… ▽ More Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in Generative Adversarial Networks (GANs), data synthesis, and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on data synthesis and adversarial networks in the artificial intelligence community. △ Less

Submitted 27 November, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: v2, 51 pages, 15 Figures, 9 Tables, accepted for publication in Medical Image Analysis

Journal ref: Medical Image Analysis (2022)

arXiv:2107.03901 [pdf, other]

Federated Learning for Multi-Center Imaging Diagnostics: A Study in Cardiovascular Disease

Authors: Akis Linardos, Kaisar Kushibar, Sean Walsh, Polyxeni Gkontra, Karim Lekadir

Abstract: Deep learning models can enable accurate and efficient disease diagnosis, but have thus far been hampered by the data scarcity present in the medical world. Automated diagnosis studies have been constrained by underpowered single-center datasets, and although some results have shown promise, their generalizability to other institutions remains questionable as the data heterogeneity between institu… ▽ More Deep learning models can enable accurate and efficient disease diagnosis, but have thus far been hampered by the data scarcity present in the medical world. Automated diagnosis studies have been constrained by underpowered single-center datasets, and although some results have shown promise, their generalizability to other institutions remains questionable as the data heterogeneity between institutions is not taken into account. By allowing models to be trained in a distributed manner that preserves patients' privacy, federated learning promises to alleviate these issues, by enabling diligent multi-center studies. We present the first federated learning study on the modality of cardiovascular magnetic resonance (CMR) and use four centers derived from subsets of the M\&M and ACDC datasets, focusing on the diagnosis of hypertrophic cardiomyopathy (HCM). We adapt a 3D-CNN network pretrained on action recognition and explore two different ways of incorporating shape prior information to the model, and four different data augmentation set-ups, systematically analyzing their impact on the different collaborative learning choices. We show that despite the small size of data (180 subjects derived from four centers), the privacy preserving federated learning achieves promising results that are competitive with traditional centralized learning. We further find that federatively trained models exhibit increased robustness and are more sensitive to domain shift effects. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: Code used in this study can be found in: https://github.com/Linardos/federated-HCM-diagnosis

Journal ref: Scientific Reports 2022

arXiv:1903.00675 [pdf, other]

A Hybrid SLAM and Object Recognition System for Pepper Robot

Authors: Paola Ardón, Kaisar Kushibar, Songyou Peng

Abstract: Humanoid robots are playing increasingly important roles in real-life tasks especially when it comes to indoor applications. Providing robust solutions for the tasks such as indoor environment mapping, self-localisation and object recognition are essential to make the robots to be more autonomous, hence, more human-like. The well-known Aldebaran service robot Pepper is a suitable candidate for ach… ▽ More Humanoid robots are playing increasingly important roles in real-life tasks especially when it comes to indoor applications. Providing robust solutions for the tasks such as indoor environment mapping, self-localisation and object recognition are essential to make the robots to be more autonomous, hence, more human-like. The well-known Aldebaran service robot Pepper is a suitable candidate for achieving these goals. In this paper, a hybrid system combining Simultaneous Localisation and Mapping (SLAM) algorithm with object recognition is developed and tested with Pepper robot in real-world conditions for the first time. The ORB SLAM 2 algorithm was taken as a seminal work in our research. Then, an object recognition technique based on Scale-Invariant Feature Transform (SIFT) and Random Sample Consensus (RANSAC) was combined with SLAM to recognise and localise objects in the mapped indoor environment. The results of our experiments showed the system's applicability for the Pepper robot in real-world scenarios. Moreover, we made our source code available for the community at https://github.com/PaolaArdon/Salt-Pepper. △ Less

Submitted 7 March, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

Comments: All authors contributed equally, listed in alphabetical order

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

arXiv:1810.04274 [pdf, ps, other]

Survival prediction using ensemble tumor segmentation and transfer learning

Authors: Mariano Cabezas, Sergi Valverde, Sandra González-Villà, Albert Clérigues, Mostafa Salem, Kaisar Kushibar, Jose Bernal, Arnau Oliver, Xavier Lladó

Abstract: Segmenting tumors and their subregions is a challenging task as demonstrated by the annual BraTS challenge. Moreover, predicting the survival of the patient using mainly imaging features, while being a desirable outcome to evaluate the treatment of the patient, it is also a difficult task. In this paper, we present a cascaded pipeline to segment the tumor and its subregions and then we use these r… ▽ More Segmenting tumors and their subregions is a challenging task as demonstrated by the annual BraTS challenge. Moreover, predicting the survival of the patient using mainly imaging features, while being a desirable outcome to evaluate the treatment of the patient, it is also a difficult task. In this paper, we present a cascaded pipeline to segment the tumor and its subregions and then we use these results and other clinical features together with image features coming from a pretrained VGG-16 network to predict the survival of the patient. Preliminary results with the training and validation dataset show a promising start in terms of segmentation, while the prediction values could be improved with further testing on the feature extraction part of the network. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: Submitted to the BRATS2018 MICCAI challenge

arXiv:1801.06457 [pdf, other]

doi 10.1109/ACCESS.2019.2926697

Quantitative analysis of patch-based fully convolutional neural networks for tissue segmentation on brain magnetic resonance imaging

Authors: Jose Bernal, Kaisar Kushibar, Mariano Cabezas, Sergi Valverde, Arnau Oliver, Xavier Lladó

Abstract: Accurate brain tissue segmentation in Magnetic Resonance Imaging (MRI) has attracted the attention of medical doctors and researchers since variations in tissue volume help in diagnosing and monitoring neurological diseases. Several proposals have been designed throughout the years comprising conventional machine learning strategies as well as convolutional neural networks (CNN) approaches. In par… ▽ More Accurate brain tissue segmentation in Magnetic Resonance Imaging (MRI) has attracted the attention of medical doctors and researchers since variations in tissue volume help in diagnosing and monitoring neurological diseases. Several proposals have been designed throughout the years comprising conventional machine learning strategies as well as convolutional neural networks (CNN) approaches. In particular, in this paper, we analyse a sub-group of deep learning methods producing dense predictions. This branch, referred in the literature as Fully CNN (FCNN), is of interest as these architectures can process an input volume in less time than CNNs and local spatial dependencies may be encoded since several voxels are classified at once. Our study focuses on understanding architectural strengths and weaknesses of literature-like approaches. Hence, we implement eight FCNN architectures inspired by robust state-of-the-art methods on brain segmentation related tasks. We evaluate them using the IBSR18, MICCAI2012 and iSeg2017 datasets as they contain infant and adult data and exhibit varied voxel spacing, image quality, number of scans and available imaging modalities. The discussion is driven in three directions: comparison between 2D and 3D approaches, the importance of multiple modalities and overlapping as a sampling strategy for training and testing models. To encourage other researchers to explore the evaluation framework, a public version is accessible to download from our research website. △ Less

Submitted 19 February, 2018; v1 submitted 19 January, 2018; originally announced January 2018.

arXiv:1712.03747 [pdf, ps, other]

doi 10.1016/j.artmed.2018.08.008

Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review

Authors: Jose Bernal, Kaisar Kushibar, Daniel S. Asfaw, Sergi Valverde, Arnau Oliver, Robert Martí, Xavier Lladó

Abstract: In recent years, deep convolutional neural networks (CNNs) have shown record-shattering performance in a variety of computer vision problems, such as visual object recognition, detection and segmentation. These methods have also been utilised in medical image analysis domain for lesion segmentation, anatomical segmentation and classification. We present an extensive literature review of CNN techni… ▽ More In recent years, deep convolutional neural networks (CNNs) have shown record-shattering performance in a variety of computer vision problems, such as visual object recognition, detection and segmentation. These methods have also been utilised in medical image analysis domain for lesion segmentation, anatomical segmentation and classification. We present an extensive literature review of CNN techniques applied in brain magnetic resonance imaging (MRI) analysis, focusing on the architectures, pre-processing, data-preparation and post-processing strategies available in these works. The aim of this study is three-fold. Our primary goal is to report how different CNN architectures have evolved, discuss state-of-the-art strategies, condense their results obtained using public datasets and examine their pros and cons. Second, this paper is intended to be a detailed reference of the research activity in deep CNN for brain MRI analysis. Finally, we present a perspective on the future of CNNs in which we hint some of the research directions in subsequent years. △ Less

Submitted 11 June, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

arXiv:1709.09075 [pdf, other]

doi 10.1016/j.media.2018.06.006

Automated sub-cortical brain structure segmentation combining spatial and deep convolutional features

Authors: Kaisar Kushibar, Sergi Valverde, Sandra Gonzalez-Villa, Jose Bernal, Mariano Cabezas, Arnau Oliver, Xavier Llado

Abstract: Sub-cortical brain structure segmentation in Magnetic Resonance Images (MRI) has attracted the interest of the research community for a long time because morphological changes in these structures are related to different neurodegenerative disorders. However, manual segmentation of these structures can be tedious and prone to variability, highlighting the need for robust automated segmentation meth… ▽ More Sub-cortical brain structure segmentation in Magnetic Resonance Images (MRI) has attracted the interest of the research community for a long time because morphological changes in these structures are related to different neurodegenerative disorders. However, manual segmentation of these structures can be tedious and prone to variability, highlighting the need for robust automated segmentation methods. In this paper, we present a novel convolutional neural network based approach for accurate segmentation of the sub-cortical brain structures that combines both convolutional and prior spatial features for improving the segmentation accuracy. In order to increase the accuracy of the automated segmentation, we propose to train the network using a restricted sample selection to force the network to learn the most difficult parts of the structures. We evaluate the accuracy of the proposed method on the public MICCAI 2012 challenge and IBSR 18 datasets, comparing it with different available state-of-the-art methods and other recently proposed deep learning approaches. On the MICCAI 2012 dataset, our method shows an excellent performance comparable to the best challenge participant strategy, while performing significantly better than state-of-the-art techniques such as FreeSurfer and FIRST. On the IBSR 18 dataset, our method also exhibits a significant increase in the performance with respect to not only FreeSurfer and FIRST, but also comparable or better results than other recent deep learning approaches. Moreover, our experiments show that both the addition of the spatial priors and the restricted sampling strategy have a significant effect on the accuracy of the proposed method. In order to encourage the reproducibility and the use of the proposed method, a public version of our approach is available to download for the neuroimaging community. △ Less

Submitted 26 September, 2017; originally announced September 2017.

Showing 1–23 of 23 results for author: Kushibar, K