-
fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data
Authors:
Kai Yang,
Masoud Asgharian,
Nikhil Bhagwat,
Jean-Baptiste Poline,
Celia M. T. Greenwood
Abstract:
In this paper, we introduce fastHDMI, a Python package designed for efficient variable screening in high-dimensional datasets, particularly neuroimaging data. This work pioneers the application of three mutual information estimation methods for neuroimaging variable selection, a novel approach implemented via fastHDMI. These advancements enhance our ability to analyze the complex structures of neu…
▽ More
In this paper, we introduce fastHDMI, a Python package designed for efficient variable screening in high-dimensional datasets, particularly neuroimaging data. This work pioneers the application of three mutual information estimation methods for neuroimaging variable selection, a novel approach implemented via fastHDMI. These advancements enhance our ability to analyze the complex structures of neuroimaging datasets, providing improved tools for variable selection in high-dimensional spaces.
Using the preprocessed ABIDE dataset, we evaluate the performance of these methods through extensive simulations. The tests cover a range of conditions, including linear and nonlinear associations, as well as continuous and binary outcomes. Our results highlight the superiority of the FFTKDE-based mutual information estimation for feature screening in continuous nonlinear outcomes, while binning-based methods outperform others for binary outcomes with nonlinear probability preimages. For linear simulations, both Pearson correlation and FFTKDE-based methods show comparable performance for continuous outcomes, while Pearson excels in binary outcomes with linear probability preimages.
A comprehensive case study using the ABIDE dataset further demonstrates fastHDMI's practical utility, showcasing the predictive power of models built from variables selected using our screening techniques. This research affirms the computational efficiency and methodological strength of fastHDMI, significantly enriching the toolkit available for neuroimaging analysis.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
An algorithm-based multiple detection influence measure for high dimensional regression using expectile
Authors:
Amadou Barry,
Nikhil Bhagwat,
Bratislav Misic,
Jean-Baptiste Poline,
Celia M. T. Greenwood
Abstract:
The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail whe…
▽ More
The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swamping when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, $\asymMIP,$ is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: \texttt{asymMIP} (\url{github.com/AmBarry/hidetify}).
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Teaching computational reproducibility for neuroimaging
Authors:
K. Jarrod Millman,
Matthew Brett,
Ross Barnowski,
Jean-Baptiste Poline
Abstract:
We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of spe…
▽ More
We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of specific neuroimaging software packages. Our experience suggests that this combination leaves the student with a superficial understanding of the underlying ideas, and an informal, inefficient, and inaccurate approach to analysis. To address these problems, we based our course around a substantial open-ended group project. This allowed us to teach: (a) computational tools to ensure computationally reproducible work, such as the Unix command line, structured code, version control, automated testing, and code review and (b) a clear understanding of the statistical techniques used for a basic analysis of a single run in an MRI scanner. The emphasis we put on the group project showed the importance of standard computational tools for accuracy, efficiency, and collaboration. The projects were broadly successful in engaging students in working reproducibly on real scientific questions. We propose that a course on this model should be the foundation for future programs in neuroimaging. We believe it will also serve as a model for teaching efficient and reproducible research in other fields of computational science.
△ Less
Submitted 15 June, 2018;
originally announced June 2018.
-
Improving accuracy and power with transfer learning using a meta-analytic database
Authors:
Yannick Schwartz,
Gaël Varoquaux,
Christophe Pallier,
Philippe Pinel,
Jean-Baptiste Poline,
Bertrand Thirion
Abstract:
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the b…
▽ More
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the basis of the coordinates of previously detected effects. In this paper, we propose to use a database of images, rather than coordinates, and frame the problem as transfer learning: learning a discriminant model on a reference task to apply it to a different but related new task. To facilitate statistical analysis of small cohorts, we use a sparse discriminant model that selects predictive voxels on the reference task and thus provides a principled procedure to define ROIs. The benefits of our approach are twofold. First it uses the reference database for prediction, i.e. to provide potential biomarkers in a clinical setting. Second it increases statistical power on the new task. We demonstrate on a set of 18 pairs of functional MRI experimental conditions that our approach gives good prediction. In addition, on a specific transfer situation involving different scanners at different locations, we show that voxel selection based on transfer learning leads to higher detection power on small cohorts.
△ Less
Submitted 28 September, 2012; v1 submitted 24 September, 2012;
originally announced September 2012.
-
Markov models for fMRI correlation structure: is brain functional connectivity small world, or decomposable into networks?
Authors:
Gaël Varoquaux,
Alexandre Gramfort,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections gi…
▽ More
Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections give a different picture: that of a highly integrated system with small-world properties: local clustering but with short pathways across the complete structure. We examine the conditional independence properties of the fMRI signal, i.e. its Markov structure, to find realistic assumptions on the connectivity structure that are required to explain the observed functional connectivity. In particular we seek a decomposition of the Markov structure into segregated functional networks using decomposable graphs: a set of strongly-connected and partially overlapping cliques. We introduce a new method to efficiently extract such cliques on a large, strongly-connected graph. We compare methods learning different graph structures from functional connectivity by testing the goodness of fit of the model they learn on new data. We find that summarizing the structure as strongly-connected networks can give a good description only for very large and overlapping networks. These results highlight that Markov models are good tools to identify the structure of brain connectivity from fMRI signals, but for this purpose they must reflect the small-world properties of the underlying neural systems.
△ Less
Submitted 3 February, 2012;
originally announced February 2012.
-
Brain covariance selection: better individual functional connectivity models using population prior
Authors:
Gaël Varoquaux,
Alexandre Gramfort,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration…
▽ More
Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.
△ Less
Submitted 12 November, 2010; v1 submitted 30 August, 2010;
originally announced August 2010.
-
ICA-based sparse feature recovery from fMRI datasets
Authors:
Gaël Varoquaux,
Merlin Keller,
Jean Baptiste Poline,
Philippe Ciuciu,
Bertrand Thirion
Abstract:
Spatial Independent Components Analysis (ICA) is increasingly used in the context of functional Magnetic Resonance Imaging (fMRI) to study cognition and brain pathologies. Salient features present in some of the extracted Independent Components (ICs) can be interpreted as brain networks, but the segmentation of the corresponding regions from ICs is still ill-controlled. Here we propose a new ICA-b…
▽ More
Spatial Independent Components Analysis (ICA) is increasingly used in the context of functional Magnetic Resonance Imaging (fMRI) to study cognition and brain pathologies. Salient features present in some of the extracted Independent Components (ICs) can be interpreted as brain networks, but the segmentation of the corresponding regions from ICs is still ill-controlled. Here we propose a new ICA-based procedure for extraction of sparse features from fMRI datasets. Specifically, we introduce a new thresholding procedure that controls the deviation from isotropy in the ICA mixing model. Unlike current heuristics, our procedure guarantees an exact, possibly conservative, level of specificity in feature detection. We evaluate the sensitivity and specificity of the method on synthetic and fMRI data and show that it outperforms state-of-the-art approaches.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
A group model for stable multi-subject ICA on fMRI datasets
Authors:
G. Varoquaux,
S. Sadaghiani,
P. Pinel,
A. Kleinschmidt,
J. B. Poline,
B. Thirion
Abstract:
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract sets of mutually correlated brain regions without prior information on the time course of these regions. Some of these sets of regions, interpreted as functional networks, have recently been used to provide marker…
▽ More
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract sets of mutually correlated brain regions without prior information on the time course of these regions. Some of these sets of regions, interpreted as functional networks, have recently been used to provide markers of brain diseases and open the road to paradigm-free population comparisons. Such group studies raise the question of modeling subject variability within ICA: how can the patterns representative of a group be modeled and estimated via ICA for reliable inter-group comparisons? In this paper, we propose a hierarchical model for patterns in multi-subject fMRI datasets, akin to mixed-effect group models used in linear-model-based analysis. We introduce an estimation procedure, CanICA (Canonical ICA), based on i) probabilistic dimension reduction of the individual data, ii) canonical correlation analysis to identify a data subspace common to the group iii) ICA-based pattern extraction. In addition, we introduce a procedure based on cross-validation to quantify the stability of ICA patterns at the level of the group. We compare our method with state-of-the-art multi-subject fMRI ICA methods and show that the features extracted using our procedure are more reproducible at the group level on two datasets of 12 healthy controls: a resting-state and a functional localizer study.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
CanICA: Model-based extraction of reproducible group-level ICA patterns from fMRI time series
Authors:
Gaël Varoquaux,
Sepideh Sadaghiani,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract meaningful patterns without prior information. However, ICA is not robust to mild data variation and remains a parameter-sensitive algorithm. The validity of the extracted patterns is hard to establish, as well…
▽ More
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract meaningful patterns without prior information. However, ICA is not robust to mild data variation and remains a parameter-sensitive algorithm. The validity of the extracted patterns is hard to establish, as well as the significance of differences between patterns extracted from different groups of subjects. We start from a generative model of the fMRI group data to introduce a probabilistic ICA pattern-extraction algorithm, called CanICA (Canonical ICA). Thanks to an explicit noise model and canonical correlation analysis, our method is auto-calibrated and identifies the group-reproducible data subspace before performing ICA. We compare our method to state-of-the-art multi-subject fMRI ICA methods and show that the features extracted are more reproducible.
△ Less
Submitted 24 November, 2009;
originally announced November 2009.