-
The impact of MRI image quality on statistical and predictive analysis on voxel based morphology
Authors:
Felix Hoffstaedter,
Nicolás Nieto,
Simon B. Eickhoff,
Kaustubh R. Patil
Abstract:
Image Quality of MRI brain scans is strongly influenced by within scanner head movements and the resulting image artifacts alter derived measures like brain volume and cortical thickness. Automated image quality assessment is key to controlling for confounding effects of poor image quality. In this study, we systematically test for the influence of image quality on univariate statistics and machin…
▽ More
Image Quality of MRI brain scans is strongly influenced by within scanner head movements and the resulting image artifacts alter derived measures like brain volume and cortical thickness. Automated image quality assessment is key to controlling for confounding effects of poor image quality. In this study, we systematically test for the influence of image quality on univariate statistics and machine learning classification. We analyzed group effects of sex/gender on local brain volume and made predictions of sex/gender using logistic regression, while correcting for brain size. From three large publicly available datasets, two age and sex-balanced samples were derived to test the generalizability of the effect for pooled sample sizes of n=760 and n=1094. Results of the Bonferroni corrected t-tests over 3747 gray matter features showed a strong influence of low-quality data on the ability to find significant sex/gender differences for the smaller sample. Increasing sample size and more so image quality showed a stark increase in detecting significant effects in univariate group comparisons. For the classification of sex/gender using logistic regression, both increasing sample size and image quality had a marginal effect on the Area under the Receiver Operating Characteristic Curve for most datasets and subsamples. Our results suggest a more stringent quality control for univariate approaches than for multivariate classification with a leaning towards higher quality for classical group statistics and bigger sample sizes for machine learning applications in neuroimaging.
△ Less
Submitted 2 November, 2024;
originally announced November 2024.
-
Deep neural network heatmaps capture Alzheimer's disease patterns reported in a large meta-analysis of neuroimaging studies
Authors:
Di Wang,
Nicolas Honnorat,
Peter T. Fox,
Kerstin Ritter,
Simon B. Eickhoff,
Sudha Seshadri,
Mohamad Habes
Abstract:
Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer's disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been pro…
▽ More
Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer's disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been proposed to address this issue and analyze the imaging patterns extracted from the deep neural networks, but no quantitative comparison between these methods has been carried out so far. In this work, we explore these questions by deriving heatmaps from Convolutional Neural Networks (CNN) trained using T1 MRI scans of the ADNI data set, and by comparing these heatmaps with brain maps corresponding to Support Vector Machines (SVM) coefficients. Three prominent heatmap methods are studied: Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG), and Guided Grad-CAM (GGC). Contrary to prior studies where the quality of heatmaps was visually or qualitatively assessed, we obtained precise quantitative measures by computing overlap with a ground-truth map from a large meta-analysis that combined 77 voxel-based morphometry (VBM) studies independently from ADNI. Our results indicate that all three heatmap methods were able to capture brain regions covering the meta-analysis map and achieved better results than SVM coefficients. Among them, IG produced the heatmaps with the best overlap with the independent meta-analysis.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression
Authors:
Claas Flint,
Micah Cearns,
Nils Opel,
Ronny Redlich,
David M. A. Mehler,
Daniel Emden,
Nils R. Winter,
Ramona Leenings,
Simon B. Eickhoff,
Tilo Kircher,
Axel Krug,
Igor Nenadic,
Volker Arolt,
Scott Clark,
Bernhard T. Baune,
Xiaoyi Jiang,
Udo Dannlowski,
Tim Hahn
Abstract:
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied ques…
▽ More
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of $N = 1,868$ MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of $61\,\%$. Next, we mimicked the process by which researchers would draw samples of various sizes ($N = 4$ to $N = 150$) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes ($N = 20$), we observe accuracies of up to $95\,\%$. For medium sample sizes ($N = 100$) accuracies up to $75\,\%$ were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
△ Less
Submitted 3 May, 2021; v1 submitted 13 December, 2019;
originally announced December 2019.