-
Analysis of the MICCAI Brain Tumor Segmentation -- Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI
Authors:
Nazanin Maleki,
Raisa Amiruddin,
Ahmed W. Moawad,
Nikolay Yordanov,
Athanasios Gkampenis,
Pascal Fehringer,
Fabian Umeh,
Crystal Chukwurah,
Fatima Memon,
Bojan Petrovic,
Justin Cramer,
Mark Krycia,
Elizabeth B. Shrickel,
Ichiro Ikuta,
Gerard Thompson,
Lorenna Vidal,
Vilma Kosovic,
Adam E. Goldman-Yassen,
Virginia Hill,
Tiffany So,
Sedra Mhana,
Albara Alotaibi,
Nathan Page,
Prisha Bhatia,
Yasaman Sharifi
, et al. (218 additional authors not shown)
Abstract:
Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms…
▽ More
Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms rely on volumetric criteria for lesion identification and treatment response assessment, which are still not available in clinical practice. Therefore, it is critical to establish tools for rapid volumetric segmentations methods that can be translated to clinical practice and that are trained on high quality annotated data. The BraTS-METS 2025 Lighthouse Challenge aims to address this critical need by establishing inter-rater and intra-rater variability in dataset annotation by generating high quality annotated datasets from four individual instances of segmentation by neuroradiologists while being recorded on video (two instances doing "from scratch" and two instances after AI pre-segmentation). This high-quality annotated dataset will be used for testing phase in 2025 Lighthouse challenge and will be publicly released at the completion of the challenge. The 2025 Lighthouse challenge will also release the 2023 and 2024 segmented datasets that were annotated using an established pipeline of pre-segmentation, student annotation, two neuroradiologists checking, and one neuroradiologist finalizing the process. It builds upon its previous edition by including post-treatment cases in the dataset. Using these high-quality annotated datasets, the 2025 Lighthouse challenge plans to test benchmark algorithms for automated segmentation of pre-and post-treatment brain metastases (BM), trained on diverse and multi-institutional datasets of MRI images obtained from patients with brain metastases.
△ Less
Submitted 6 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Reproducing NevIR: Negation in Neural Information Retrieval
Authors:
Coen van den Elsen,
Francien Barkhof,
Thijmen Nijdam,
Simon Lupart,
Mohammad Aliannejadi
Abstract:
Negation is a fundamental aspect of human communication, yet it remains a challenge for Language Models (LMs) in Information Retrieval (IR). Despite the heavy reliance of modern neural IR systems on LMs, little attention has been given to their handling of negation. In this study, we reproduce and extend the findings of NevIR, a benchmark study that revealed most IR models perform at or below the…
▽ More
Negation is a fundamental aspect of human communication, yet it remains a challenge for Language Models (LMs) in Information Retrieval (IR). Despite the heavy reliance of modern neural IR systems on LMs, little attention has been given to their handling of negation. In this study, we reproduce and extend the findings of NevIR, a benchmark study that revealed most IR models perform at or below the level of random ranking when dealing with negation. We replicate NevIR's original experiments and evaluate newly developed state-of-the-art IR models. Our findings show that a recently emerging category-listwise Large Language Model (LLM) re-rankers-outperforms other models but still underperforms human performance. Additionally, we leverage ExcluIR, a benchmark dataset designed for exclusionary queries with extensive negation, to assess the generalisability of negation understanding. Our findings suggest that fine-tuning on one dataset does not reliably improve performance on the other, indicating notable differences in their data distributions. Furthermore, we observe that only cross-encoders and listwise LLM re-rankers achieve reasonable performance across both negation tasks.
△ Less
Submitted 4 May, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
P-Count: Persistence-based Counting of White Matter Hyperintensities in Brain MRI
Authors:
Xiaoling Hu,
Annabel Sorby-Adams,
Frederik Barkhof,
W Taylor Kimberly,
Oula Puonti,
Juan Eugenio Iglesias
Abstract:
White matter hyperintensities (WMH) are a hallmark of cerebrovascular disease and multiple sclerosis. Automated WMH segmentation methods enable quantitative analysis via estimation of total lesion load, spatial distribution of lesions, and number of lesions (i.e., number of connected components after thresholding), all of which are correlated with patient outcomes. While the two former measures ca…
▽ More
White matter hyperintensities (WMH) are a hallmark of cerebrovascular disease and multiple sclerosis. Automated WMH segmentation methods enable quantitative analysis via estimation of total lesion load, spatial distribution of lesions, and number of lesions (i.e., number of connected components after thresholding), all of which are correlated with patient outcomes. While the two former measures can generally be estimated robustly, the number of lesions is highly sensitive to noise and segmentation mistakes -- even when small connected components are eroded or disregarded. In this article, we present P-Count, an algebraic WMH counting tool based on persistent homology that accounts for the topological features of WM lesions in a robust manner. Using computational geometry, P-Count takes the persistence of connected components into consideration, effectively filtering out the noisy WMH positives, resulting in a more accurate count of true lesions. We validated P-Count on the ISBI2015 longitudinal lesion segmentation dataset, where it produces significantly more accurate results than direct thresholding.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Quantifying white matter hyperintensity and brain volumes in heterogeneous clinical and low-field portable MRI
Authors:
Pablo Laso,
Stefano Cerri,
Annabel Sorby-Adams,
Jennifer Guo,
Farrah Mateen,
Philipp Goebl,
Jiaming Wu,
Peirong Liu,
Hongwei Li,
Sean I. Young,
Benjamin Billot,
Oula Puonti,
Gordon Sze,
Sam Payabavash,
Adam DeHavenon,
Kevin N. Sheth,
Matthew S. Rosen,
John Kirsch,
Nicola Strisciuglio,
Jelmer M. Wolterink,
Arman Eshaghi,
Frederik Barkhof,
W. Taylor Kimberly,
Juan Eugenio Iglesias
Abstract:
Brain atrophy and white matter hyperintensity (WMH) are critical neuroimaging features for ascertaining brain injury in cerebrovascular disease and multiple sclerosis. Automated segmentation and quantification is desirable but existing methods require high-resolution MRI with good signal-to-noise ratio (SNR). This precludes application to clinical and low-field portable MRI (pMRI) scans, thus hamp…
▽ More
Brain atrophy and white matter hyperintensity (WMH) are critical neuroimaging features for ascertaining brain injury in cerebrovascular disease and multiple sclerosis. Automated segmentation and quantification is desirable but existing methods require high-resolution MRI with good signal-to-noise ratio (SNR). This precludes application to clinical and low-field portable MRI (pMRI) scans, thus hampering large-scale tracking of atrophy and WMH progression, especially in underserved areas where pMRI has huge potential. Here we present a method that segments white matter hyperintensity and 36 brain regions from scans of any resolution and contrast (including pMRI) without retraining. We show results on eight public datasets and on a private dataset with paired high- and low-field scans (3T and 64mT), where we attain strong correlation between the WMH ($ρ$=.85) and hippocampal volumes (r=.89) estimated at both fields. Our method is publicly available as part of FreeSurfer, at: http://surfer.nmr.mgh.harvard.edu/fswiki/WMH-SynthSeg.
△ Less
Submitted 15 February, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
A coupled-mechanisms modelling framework for neurodegeneration
Authors:
Tiantian He,
Elinor Thompson,
Anna Schroder,
Neil P. Oxtoby,
Ahmed Abdulaal,
Frederik Barkhof,
Daniel C. Alexander
Abstract:
Computational models of neurodegeneration aim to emulate the evolving pattern of pathology in the brain during neurodegenerative disease, such as Alzheimer's disease. Previous studies have made specific choices on the mechanisms of pathology production and diffusion, or assume that all the subjects lie on the same disease progression trajectory. However, the complexity and heterogeneity of neurode…
▽ More
Computational models of neurodegeneration aim to emulate the evolving pattern of pathology in the brain during neurodegenerative disease, such as Alzheimer's disease. Previous studies have made specific choices on the mechanisms of pathology production and diffusion, or assume that all the subjects lie on the same disease progression trajectory. However, the complexity and heterogeneity of neurodegenerative pathology suggests that multiple mechanisms may contribute synergistically with complex interactions, meanwhile the degree of contribution of each mechanism may vary among individuals. We thus put forward a coupled-mechanisms modelling framework which non-linearly combines the network-topology-informed pathology appearance with the process of pathology spreading within a dynamic modelling system. We account for the heterogeneity of disease by fitting the model at the individual level, allowing the epicenters and rate of progression to vary among subjects. We construct a Bayesian model selection framework to account for feature importance and parameter uncertainty. This provides a combination of mechanisms that best explains the observations for each individual from the ADNI dataset. With the obtained distribution of mechanism importance for each subject, we are able to identify subgroups of patients sharing similar combinations of apparent mechanisms.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI
Authors:
Ahmed W. Moawad,
Anastasia Janas,
Ujjwal Baid,
Divya Ramakrishnan,
Rachit Saluja,
Nader Ashraf,
Nazanin Maleki,
Leon Jekel,
Nikolay Yordanov,
Pascal Fehringer,
Athanasios Gkampenis,
Raisa Amiruddin,
Amirreza Manteghinejad,
Maruf Adewole,
Jake Albrecht,
Udunna Anazodo,
Sanjay Aneja,
Syed Muhammad Anwar,
Timothy Bergquist,
Veronica Chiang,
Verena Chung,
Gian Marco Conte,
Farouk Dako,
James Eddy,
Ivan Ezhov
, et al. (207 additional authors not shown)
Abstract:
The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and chara…
▽ More
The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and characterizes the challenging cases that impacted the performance of the winning algorithms. Untreated brain metastases on standard anatomic MRI sequences (T1, T2, FLAIR, T1PG) from eight contributed international datasets were annotated in stepwise method: published UNET algorithms, student, neuroradiologist, final approver neuroradiologist. Segmentations were ranked based on lesion-wise Dice and Hausdorff distance (HD95) scores. False positives (FP) and false negatives (FN) were rigorously penalized, receiving a score of 0 for Dice and a fixed penalty of 374 for HD95. Eight datasets comprising 1303 studies were annotated, with 402 studies (3076 lesions) released on Synapse as publicly available datasets to challenge competitors. Additionally, 31 studies (139 lesions) were held out for validation, and 59 studies (218 lesions) were used for testing. Segmentation accuracy was measured as rank across subjects, with the winning team achieving a LesionWise mean score of 7.9. Common errors among the leading teams included false negatives for small lesions and misregistration of masks in space.The BraTS-METS 2023 challenge successfully curated well-annotated, diverse datasets and identified common errors, facilitating the translation of BM segmentation across varied clinical environments and providing personalized volumetric reports to patients undergoing BM treatment.
△ Less
Submitted 8 December, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Segmentation of glioblastomas in early post-operative multi-modal MRI with deep neural networks
Authors:
Ragnhild Holden Helland,
Alexandros Ferles,
André Pedersen,
Ivar Kommers,
Hilko Ardon,
Frederik Barkhof,
Lorenzo Bello,
Mitchel S. Berger,
Tora Dunås,
Marco Conti Nibali,
Julia Furtner,
Shawn Hervey-Jumper,
Albert J. S. Idema,
Barbara Kiesel,
Rishi Nandoe Tewari,
Emmanuel Mandonnet,
Domenique M. J. Müller,
Pierre A. Robe,
Marco Rossi,
Lisa M. Sagberg,
Tommaso Sciortino,
Tom Aalders,
Michiel Wagemakers,
Georg Widhalm,
Marnix G. Witte
, et al. (8 additional authors not shown)
Abstract:
Extent of resection after surgery is one of the main prognostic factors for patients diagnosed with glioblastoma. To achieve this, accurate segmentation and classification of residual tumor from post-operative MR images is essential. The current standard method for estimating it is subject to high inter- and intra-rater variability, and an automated method for segmentation of residual tumor in ear…
▽ More
Extent of resection after surgery is one of the main prognostic factors for patients diagnosed with glioblastoma. To achieve this, accurate segmentation and classification of residual tumor from post-operative MR images is essential. The current standard method for estimating it is subject to high inter- and intra-rater variability, and an automated method for segmentation of residual tumor in early post-operative MRI could lead to a more accurate estimation of extent of resection. In this study, two state-of-the-art neural network architectures for pre-operative segmentation were trained for the task. The models were extensively validated on a multicenter dataset with nearly 1000 patients, from 12 hospitals in Europe and the United States. The best performance achieved was a 61\% Dice score, and the best classification performance was about 80\% balanced accuracy, with a demonstrated ability to generalize across hospitals. In addition, the segmentation performance of the best models was on par with human expert raters. The predicted segmentations can be used to accurately classify the patients into those with residual tumor, and those with gross total resection.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
DeepBrainPrint: A Novel Contrastive Framework for Brain MRI Re-Identification
Authors:
Lemuel Puglisi,
Frederik Barkhof,
Daniel C. Alexander,
Geoffrey JM Parker,
Arman Eshaghi,
Daniele Ravì
Abstract:
Recent advances in MRI have led to the creation of large datasets. With the increase in data volume, it has become difficult to locate previous scans of the same patient within these datasets (a process known as re-identification). To address this issue, we propose an AI-powered medical imaging retrieval framework called DeepBrainPrint, which is designed to retrieve brain MRI scans of the same pat…
▽ More
Recent advances in MRI have led to the creation of large datasets. With the increase in data volume, it has become difficult to locate previous scans of the same patient within these datasets (a process known as re-identification). To address this issue, we propose an AI-powered medical imaging retrieval framework called DeepBrainPrint, which is designed to retrieve brain MRI scans of the same patient. Our framework is a semi-self-supervised contrastive deep learning approach with three main innovations. First, we use a combination of self-supervised and supervised paradigms to create an effective brain fingerprint from MRI scans that can be used for real-time image retrieval. Second, we use a special weighting function to guide the training and improve model convergence. Third, we introduce new imaging transformations to improve retrieval robustness in the presence of intensity variations (i.e. different scan contrasts), and to account for age and disease progression in patients. We tested DeepBrainPrint on a large dataset of T1-weighted brain MRIs from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and on a synthetic dataset designed to evaluate retrieval performance with different image modalities. Our results show that DeepBrainPrint outperforms previous methods, including simple similarity metrics and more advanced contrastive deep learning frameworks.
△ Less
Submitted 24 September, 2023; v1 submitted 25 February, 2023;
originally announced February 2023.
-
Where is VALDO? VAscular Lesions Detection and segmentatiOn challenge at MICCAI 2021
Authors:
Carole H. Sudre,
Kimberlin Van Wijnen,
Florian Dubost,
Hieab Adams,
David Atkinson,
Frederik Barkhof,
Mahlet A. Birhanu,
Esther E. Bron,
Robin Camarasa,
Nish Chaturvedi,
Yuan Chen,
Zihao Chen,
Shuai Chen,
Qi Dou,
Tavia Evans,
Ivan Ezhov,
Haojun Gao,
Marta Girones Sanguesa,
Juan Domingo Gispert,
Beatriz Gomez Anson,
Alun D. Hughes,
M. Arfan Ikram,
Silvia Ingala,
H. Rolf Jaeger,
Florian Kofler
, et al. (24 additional authors not shown)
Abstract:
Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular…
▽ More
Imaging markers of cerebral small vessel disease provide valuable information on brain health, but their manual assessment is time-consuming and hampered by substantial intra- and interrater variability. Automated rating may benefit biomedical research, as well as clinical assessment, but diagnostic reliability of existing algorithms is unknown. Here, we present the results of the \textit{VAscular Lesions DetectiOn and Segmentation} (\textit{Where is VALDO?}) challenge that was run as a satellite event at the international conference on Medical Image Computing and Computer Aided Intervention (MICCAI) 2021. This challenge aimed to promote the development of methods for automated detection and segmentation of small and sparse imaging markers of cerebral small vessel disease, namely enlarged perivascular spaces (EPVS) (Task 1), cerebral microbleeds (Task 2) and lacunes of presumed vascular origin (Task 3) while leveraging weak and noisy labels. Overall, 12 teams participated in the challenge proposing solutions for one or more tasks (4 for Task 1 - EPVS, 9 for Task 2 - Microbleeds and 6 for Task 3 - Lacunes). Multi-cohort data was used in both training and evaluation. Results showed a large variability in performance both across teams and across tasks, with promising results notably for Task 1 - EPVS and Task 2 - Microbleeds and not practically useful results yet for Task 3 - Lacunes. It also highlighted the performance inconsistency across cases that may deter use at an individual level, while still proving useful at a population level.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Computer-aided diagnosis and prediction in brain disorders
Authors:
Vikram Venkatraghavan,
Sebastian R. van der Voort,
Daniel Bos,
Marion Smits,
Frederik Barkhof,
Wiro J. Niessen,
Stefan Klein,
Esther E. Bron
Abstract:
Computer-aided methods have shown added value for diagnosing and predicting brain disorders and can thus support decision making in clinical care and treatment planning. This chapter will provide insight into the type of methods, their working, their input data - such as cognitive tests, imaging and genetic data - and the types of output they provide. We will focus on specific use cases for diagno…
▽ More
Computer-aided methods have shown added value for diagnosing and predicting brain disorders and can thus support decision making in clinical care and treatment planning. This chapter will provide insight into the type of methods, their working, their input data - such as cognitive tests, imaging and genetic data - and the types of output they provide. We will focus on specific use cases for diagnosis, i.e. estimating the current 'condition' of the patient, such as early detection and diagnosis of dementia, differential diagnosis of brain tumours, and decision making in stroke. Regarding prediction, i.e. estimation of the future 'condition' of the patient, we will zoom in on use cases such as predicting the disease course in multiple sclerosis and predicting patient outcomes after treatment in brain cancer. Furthermore, based on these use cases, we will assess the current state-of-the-art methodology and highlight current efforts on benchmarking of these methods and the importance of open science therein. Finally, we assess the current clinical impact of computer-aided methods and discuss the required next steps to increase clinical impact.
△ Less
Submitted 31 October, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
An efficient semi-supervised quality control system trained using physics-based MRI-artefact generators and adversarial training
Authors:
Daniele Ravi,
Frederik Barkhof,
Daniel C. Alexander,
Lemuel Puglisi,
Geoffrey JM Parker,
Arman Eshaghi
Abstract:
Large medical imaging data sets are becoming increasingly available, but ensuring sample quality without significant artefacts is challenging. Existing methods for identifying imperfections in medical imaging rely on data-intensive approaches, compounded by a scarcity of artefact-rich scans for training machine learning models in clinical research. To tackle this problem, we propose a framework wi…
▽ More
Large medical imaging data sets are becoming increasingly available, but ensuring sample quality without significant artefacts is challenging. Existing methods for identifying imperfections in medical imaging rely on data-intensive approaches, compounded by a scarcity of artefact-rich scans for training machine learning models in clinical research. To tackle this problem, we propose a framework with four main components: 1) artefact generators inspired by magnetic resonance physics to corrupt brain MRI scans and augment a training dataset, 2) abstract and engineered features to represent images compactly, 3) a feature selection process depending on the artefact class to improve classification, and 4) SVM classifiers to identify artefacts. Our contributions are threefold: first, physics-based artefact generators produce synthetic brain MRI scans with controlled artefacts for data augmentation. This will avoid the labour-intensive collection and labelling process of scans with rare artefacts. Second, we propose a pool of abstract and engineered image features to identify 9 different artefacts for structural MRI. Finally, we use an artefact-based feature selection block that, for each class of artefacts, finds the set of features providing the best classification performance. We performed validation experiments on a large data set of scans with artificially-generated artefacts, and in a multiple sclerosis clinical trial where real artefacts were identified by experts, showing that the proposed pipeline outperforms traditional methods. In particular, our data augmentation increases performance by up to 12.5 percentage points on accuracy, precision, and recall. The computational efficiency of our pipeline enables potential real-time deployment, promising high-throughput clinical applications through automated image-processing pipelines driven by quality control systems.
△ Less
Submitted 14 November, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Preoperative brain tumor imaging: models and software for segmentation and standardized reporting
Authors:
D. Bouget,
A. Pedersen,
A. S. Jakola,
V. Kavouridis,
K. E. Emblem,
R. S. Eijgelaar,
I. Kommers,
H. Ardon,
F. Barkhof,
L. Bello,
M. S. Berger,
M. C. Nibali,
J. Furtner,
S. Hervey-Jumper,
A. J. S. Idema,
B. Kiesel,
A. Kloet,
E. Mandonnet,
D. M. J. Müller,
P. A. Robe,
M. Rossi,
T. Sciortino,
W. Van den Brink,
M. Wagemakers,
G. Widhalm
, et al. (5 additional authors not shown)
Abstract:
For patients suffering from brain tumor, prognosis estimation and treatment decisions are made by a multidisciplinary team based on a set of preoperative MR scans. Currently, the lack of standardized and automatic methods for tumor detection and generation of clinical reports represents a major hurdle. In this study, we investigate glioblastomas, lower grade gliomas, meningiomas, and metastases, t…
▽ More
For patients suffering from brain tumor, prognosis estimation and treatment decisions are made by a multidisciplinary team based on a set of preoperative MR scans. Currently, the lack of standardized and automatic methods for tumor detection and generation of clinical reports represents a major hurdle. In this study, we investigate glioblastomas, lower grade gliomas, meningiomas, and metastases, through four cohorts of up to 4000 patients. Tumor segmentation models were trained using the AGU-Net architecture with different preprocessing steps and protocols. Segmentation performances were assessed in-depth using a wide-range of voxel and patient-wise metrics covering volume, distance, and probabilistic aspects. Finally, two software solutions have been developed, enabling an easy use of the trained models and standardized generation of clinical reports: Raidionics and Raidionics-Slicer. Segmentation performances were quite homogeneous across the four different brain tumor types, with an average true positive Dice ranging between 80% and 90%, patient-wise recall between 88% and 98%, and patient-wise precision around 95%. With our Raidionics software, running on a desktop computer with CPU support, tumor segmentation can be performed in 16 to 54 seconds depending on the dimensions of the MRI volume. For the generation of a standardized clinical report, including the tumor segmentation and features computation, 5 to 15 minutes are necessary. All trained models have been made open-access together with the source code for both software solutions and validation metrics computation. In the future, an automatic classification of the brain tumor type would be necessary to replace manual user input. Finally, the inclusion of post-operative segmentation in both software solutions will be key for generating complete post-operative standardized clinical reports.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Disentangling Human Error from the Ground Truth in Segmentation of Medical Images
Authors:
Le Zhang,
Ryutaro Tanno,
Mou-Cheng Xu,
Chen Jin,
Joseph Jacob,
Olga Ciccarelli,
Frederik Barkhof,
Daniel C. Alexander
Abstract:
Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their…
▽ More
Recent years have seen increasing use of supervised learning methods for segmentation tasks. However, the predictive performance of these algorithms depends on the quality of labels. This problem is particularly pertinent in the medical image domain, where both the annotation cost and inter-observer variability are high. In a typical label acquisition process, different human experts provide their estimates of the "true" segmentation labels under the influence of their own biases and competence levels. Treating these noisy labels blindly as the ground truth limits the performance that automatic segmentation algorithms can achieve. In this work, we present a method for jointly learning, from purely noisy observations alone, the reliability of individual annotators and the true segmentation label distributions, using two coupled CNNs. The separation of the two is achieved by encouraging the estimated annotators to be maximally unreliable while achieving high fidelity with the noisy training data. We first define a toy segmentation dataset based on MNIST and study the properties of the proposed algorithm. We then demonstrate the utility of the method on three public medical imaging segmentation datasets with simulated (when necessary) and real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms competing methods and relevant baselines particularly in cases where the number of annotations is small and the amount of disagreement is large. The experiments also show strong ability to capture the complex spatial characteristics of annotators' mistakes.
△ Less
Submitted 23 October, 2020; v1 submitted 31 July, 2020;
originally announced July 2020.
-
The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up
Authors:
Razvan V. Marinescu,
Neil P. Oxtoby,
Alexandra L. Young,
Esther E. Bron,
Arthur W. Toga,
Michael W. Weiner,
Frederik Barkhof,
Nick C. Fox,
Arman Eshaghi,
Tina Toni,
Marcin Salaterski,
Veronika Lunina,
Manon Ansart,
Stanley Durrleman,
Pascal Lu,
Samuel Iddi,
Dan Li,
Wesley K. Thompson,
Michael C. Donohue,
Aviv Nahon,
Yarden Levy,
Dan Halbersberg,
Mariya Cohen,
Huiling Liao,
Tengfei Li
, et al. (71 additional authors not shown)
Abstract:
We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcome…
▽ More
We present the findings of "The Alzheimer's Disease Prediction Of Longitudinal Evolution" (TADPOLE) Challenge, which compared the performance of 92 algorithms from 33 international teams at predicting the future trajectory of 219 individuals at risk of Alzheimer's disease. Challenge participants were required to make a prediction, for each month of a 5-year future time period, of three key outcomes: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. The methods used by challenge participants included multivariate linear regression, machine learning methods such as support vector machines and deep neural networks, as well as disease progression models. No single submission was best at predicting all three outcomes. For clinical diagnosis and ventricle volume prediction, the best algorithms strongly outperform simple baselines in predictive ability. However, for ADAS-Cog13 no single submitted prediction method was significantly better than random guesswork. Two ensemble methods based on taking the mean and median over all predictions, obtained top scores on almost all tasks. Better than average performance at diagnosis prediction was generally associated with the additional inclusion of features from cerebrospinal fluid (CSF) samples and diffusion tensor imaging (DTI). On the other hand, better performance at ventricle volume prediction was associated with inclusion of summary statistics, such as the slope or maxima/minima of biomarkers. TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease. However, results call into question the usage of cognitive test scores for patient selection and as a primary endpoint in clinical trials.
△ Less
Submitted 27 December, 2021; v1 submitted 9 February, 2020;
originally announced February 2020.
-
TADPOLE Challenge: Accurate Alzheimer's disease prediction through crowdsourced forecasting of future data
Authors:
Razvan V. Marinescu,
Neil P. Oxtoby,
Alexandra L. Young,
Esther E. Bron,
Arthur W. Toga,
Michael W. Weiner,
Frederik Barkhof,
Nick C. Fox,
Polina Golland,
Stefan Klein,
Daniel C. Alexander
Abstract:
The TADPOLE Challenge compares the performance of algorithms at predicting the future evolution of individuals at risk of Alzheimer's disease. TADPOLE Challenge participants train their models and algorithms on historical data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. Participants are then required to make forecasts of three key outcomes for ADNI-3 rollover participants: c…
▽ More
The TADPOLE Challenge compares the performance of algorithms at predicting the future evolution of individuals at risk of Alzheimer's disease. TADPOLE Challenge participants train their models and algorithms on historical data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. Participants are then required to make forecasts of three key outcomes for ADNI-3 rollover participants: clinical diagnosis, ADAS-Cog 13, and total volume of the ventricles -- which are then compared with future measurements. Strong points of the challenge are that the test data did not exist at the time of forecasting (it was acquired afterwards), and that it focuses on the challenging problem of cohort selection for clinical trials by identifying fast progressors. The submission phase of TADPOLE was open until 15 November 2017; since then data has been acquired until April 2019 from 219 subjects with 223 clinical visits and 150 Magnetic Resonance Imaging (MRI) scans, which was used for the evaluation of the participants' predictions. Thirty-three teams participated with a total of 92 submissions. No single submission was best at predicting all three outcomes. For diagnosis prediction, the best forecast (team Frog), which was based on gradient boosting, obtained a multiclass area under the receiver-operating curve (MAUC) of 0.931, while for ventricle prediction the best forecast (team EMC1), which was based on disease progression modelling and spline regression, obtained mean absolute error of 0.41% of total intracranial volume (ICV). For ADAS-Cog 13, no forecast was considerably better than the benchmark mixed effects model (BenchmarkME), provided to participants before the submission deadline. Further analysis can help understand which input features and algorithms are most suitable for Alzheimer's disease prediction and for aiding patient stratification in clinical trials.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
Degenerative Adversarial NeuroImage Nets for Brain Scan Simulations: Application in Ageing and Dementia
Authors:
Daniele Ravi,
Stefano B. Blumberg,
Silvia Ingala,
Frederik Barkhof,
Daniel C. Alexander,
Neil P. Oxtoby
Abstract:
Accurate and realistic simulation of high-dimensional medical images has become an important research area relevant to many AI-enabled healthcare applications. However, current state-of-the-art approaches lack the ability to produce satisfactory high-resolution and accurate subject-specific images. In this work, we present a deep learning framework, namely 4D-Degenerative Adversarial NeuroImage Ne…
▽ More
Accurate and realistic simulation of high-dimensional medical images has become an important research area relevant to many AI-enabled healthcare applications. However, current state-of-the-art approaches lack the ability to produce satisfactory high-resolution and accurate subject-specific images. In this work, we present a deep learning framework, namely 4D-Degenerative Adversarial NeuroImage Net (4D-DANI-Net), to generate high-resolution, longitudinal MRI scans that mimic subject-specific neurodegeneration in ageing and dementia. 4D-DANI-Net is a modular framework based on adversarial training and a set of novel spatiotemporal, biologically-informed constraints. To ensure efficient training and overcome memory limitations affecting such high-dimensional problems, we rely on three key technological advances: i) a new 3D training consistency mechanism called Profile Weight Functions (PWFs), ii) a 3D super-resolution module and iii) a transfer learning strategy to fine-tune the system for a given individual. To evaluate our approach, we trained the framework on 9852 T1-weighted MRI scans from 876 participants in the Alzheimer's Disease Neuroimaging Initiative dataset and held out a separate test set of 1283 MRI scans from 170 participants for quantitative and qualitative assessment of the personalised time series of synthetic images. We performed three evaluations: i) image quality assessment; ii) quantifying the accuracy of regional brain volumes over and above benchmark models; and iii) quantifying visual perception of the synthetic images by medical experts. Overall, both quantitative and qualitative results show that 4D-DANI-Net produces realistic, low-artefact, personalised time series of synthetic T1 MRI that outperforms benchmark models.
△ Less
Submitted 29 September, 2021; v1 submitted 3 December, 2019;
originally announced December 2019.
-
Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge
Authors:
Hugo J. Kuijf,
J. Matthijs Biesbroek,
Jeroen de Bresser,
Rutger Heinen,
Simon Andermatt,
Mariana Bento,
Matt Berseth,
Mikhail Belyaev,
M. Jorge Cardoso,
Adrià Casamitjana,
D. Louis Collins,
Mahsa Dadar,
Achilleas Georgiou,
Mohsen Ghafoorian,
Dakai Jin,
April Khademi,
Jesse Knight,
Hongwei Li,
Xavier Lladó,
Miguel Luna,
Qaiser Mahmood,
Richard McKinley,
Alireza Mehrtash,
Sébastien Ourselin,
Bo-yong Park
, et al. (19 additional authors not shown)
Abstract:
Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We…
▽ More
Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/).
Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness.
Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners.
The challenge remains open for future submissions and provides a public platform for method evaluation.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
TADPOLE Challenge: Prediction of Longitudinal Evolution in Alzheimer's Disease
Authors:
Razvan V. Marinescu,
Neil P. Oxtoby,
Alexandra L. Young,
Esther E. Bron,
Arthur W. Toga,
Michael W. Weiner,
Frederik Barkhof,
Nick C. Fox,
Stefan Klein,
Daniel C. Alexander,
the EuroPOND Consortium
Abstract:
The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge compares the performance of algorithms at predicting future evolution of individuals at risk of Alzheimer's disease. TADPOLE Challenge participants train their models and algorithms on historical data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study or any other datasets to which they have access. Par…
▽ More
The Alzheimer's Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge compares the performance of algorithms at predicting future evolution of individuals at risk of Alzheimer's disease. TADPOLE Challenge participants train their models and algorithms on historical data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study or any other datasets to which they have access. Participants are then required to make monthly forecasts over a period of 5 years from January 2018, of three key outcomes for ADNI-3 rollover participants: clinical diagnosis, Alzheimer's Disease Assessment Scale Cognitive Subdomain (ADAS-Cog13), and total volume of the ventricles. These individual forecasts are later compared with the corresponding future measurements in ADNI-3 (obtained after the TADPOLE submission deadline). The first submission phase of TADPOLE was open for prize-eligible submissions between 15 June and 15 November 2017. The submission system remains open via the website: https://tadpole.grand-challenge.org, although since 15 November 2017 submissions are not eligible for the first round of prizes. This paper describes the design of the TADPOLE Challenge.
△ Less
Submitted 30 August, 2018; v1 submitted 10 May, 2018;
originally announced May 2018.