-
brainlife.io: A decentralized and open source cloud platform to support neuroscience research
Authors:
Soichi Hayashi,
Bradley A. Caron,
Anibal Sólon Heinsfeld,
Sophia Vinci-Booher,
Brent McPherson,
Daniel N. Bullock,
Giulia Bertò,
Guiomar Niso,
Sandra Hanekamp,
Daniel Levitas,
Kimberly Ray,
Anne MacKenzie,
Lindsey Kitchell,
Josiah K. Leong,
Filipi Nascimento-Silva,
Serge Koudoro,
Hanna Willis,
Jasleen K. Jolly,
Derek Pisner,
Taylor R. Zuidema,
Jan W. Kurzawski,
Kyriaki Mikellidou,
Aurore Bussalb,
Christopher Rorden,
Conner Victory
, et al. (39 additional authors not shown)
Abstract:
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to red…
▽ More
Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR (Findable, Accessible, Interoperabile, and Reusable) data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research.
△ Less
Submitted 11 August, 2023; v1 submitted 3 June, 2023;
originally announced June 2023.
-
A robust multivariate, non-parametric outlier identification method for scrubbing in fMRI
Authors:
Fatma Parlak,
Damon D. Pham,
Amanda F. Mejia
Abstract:
Functional magnetic resonance imaging (fMRI) data contain high levels of noise and artifacts. To avoid contamination of downstream analyses, fMRI-based studies must identify and remove these noise sources prior to statistical analysis. One common approach is the "scrubbing" of fMRI volumes that are thought to contain high levels of noise. However, existing scrubbing techniques are based on ad hoc…
▽ More
Functional magnetic resonance imaging (fMRI) data contain high levels of noise and artifacts. To avoid contamination of downstream analyses, fMRI-based studies must identify and remove these noise sources prior to statistical analysis. One common approach is the "scrubbing" of fMRI volumes that are thought to contain high levels of noise. However, existing scrubbing techniques are based on ad hoc measures of signal change. We consider scrubbing via outlier detection, where volumes containing artifacts are considered multidimensional outliers. Robust multivariate outlier detection methods are proposed using robust distances (RDs), which are related to the Mahalanobis distance. These RDs have a known distribution when the data are i.i.d. normal, and that distribution can be used to determine a threshold for outliers where fMRI data violate these assumptions. Here, we develop a robust multivariate outlier detection method that is applicable to non-normal data. The objective is to obtain threshold values to flag outlying volumes based on their RDs. We propose two threshold candidates that embark on the same two steps, but the choice of which depends on a researcher's purpose. Our main steps are dimension reduction and selection, robust univariate outlier imputation to get rid of the effect of outliers on the distribution, and estimating an outlier threshold based on the upper quantile of the RD distribution without outliers. The first threshold candidate is an upper quantile of the empirical distribution of RDs obtained from the imputed data. The second threshold candidate calculates the upper quantile of the RD distribution that a nonparametric bootstrap uses to account for uncertainty in the empirical quantile. We compare our proposed fMRI scrubbing method to motion scrubbing, data-driven scrubbing, and restrictive parametric multivariate outlier detection methods.
△ Less
Submitted 2 May, 2023; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Fast Bayesian estimation of brain activation with cortical surface fMRI data using EM
Authors:
Daniel A. Spencer,
David Bolin,
Amanda F. Mejia
Abstract:
Task functional magnetic resonance imaging (fMRI) is a type of neuroimaging data used to identify areas of the brain that activate during specific tasks or stimuli. These data are conventionally modeled using a massive univariate approach across all data locations, which ignores spatial dependence at the cost of model power. We previously developed and validated a spatial Bayesian model leveraging…
▽ More
Task functional magnetic resonance imaging (fMRI) is a type of neuroimaging data used to identify areas of the brain that activate during specific tasks or stimuli. These data are conventionally modeled using a massive univariate approach across all data locations, which ignores spatial dependence at the cost of model power. We previously developed and validated a spatial Bayesian model leveraging dependencies along the cortical surface of the brain in order to improve accuracy and power. This model utilizes stochastic partial differential equation spatial priors with sparse precision matrices to allow for appropriate modeling of spatially-dependent activations seen in the neuroimaging literature, resulting in substantial increases in model power. Our original implementation relies on the computational efficiencies of the integrated nested Laplace approximation (INLA) to overcome the computational challenges of analyzing high-dimensional fMRI data while avoiding issues associated with variational Bayes implementations. However, this requires significant memory resources, extra software, and software licenses to run. In this article, we develop an exact Bayesian analysis method for the general linear model, employing an efficient expectation-maximization algorithm to find maximum a posteriori estimates of task-based regressors on cortical surface fMRI data. Through an extensive simulation study of cortical surface-based fMRI data, we compare our proposed method to the existing INLA implementation, as well as a conventional massive univariate approach employing ad-hoc spatial smoothing. We also apply the method to task fMRI data from the Human Connectome Project and show that our proposed implementation produces similar results to the validated INLA implementation. Both the INLA and EM-based implementations are available through our open-source BayesfMRI R package.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Sources of residual autocorrelation in multiband task fMRI and strategies for effective mitigation
Authors:
Fatma Parlak,
Damon D. Pham,
Daniel A. Spencer,
Robert C. Welsh,
Amanda F. Mejia
Abstract:
In task fMRI analysis, OLS is typically used to estimate task-induced activation in the brain. Since task fMRI residuals often exhibit temporal autocorrelation, it is common practice to perform prewhitening prior to OLS to satisfy the assumption of residual independence, equivalent to GLS. While theoretically straightforward, a major challenge in prewhitening in fMRI is accurately estimating the r…
▽ More
In task fMRI analysis, OLS is typically used to estimate task-induced activation in the brain. Since task fMRI residuals often exhibit temporal autocorrelation, it is common practice to perform prewhitening prior to OLS to satisfy the assumption of residual independence, equivalent to GLS. While theoretically straightforward, a major challenge in prewhitening in fMRI is accurately estimating the residual autocorrelation at each location of the brain. Assuming a global autocorrelation model, as in several fMRI software programs, may under- or over-whiten particular regions and fail to achieve nominal false positive control across the brain. Faster multiband acquisitions require more sophisticated models to capture autocorrelation, making prewhitening more difficult. These issues are becoming more critical now because of a trend towards subject-level analysis, where prewhitening has a greater impact than in group-average analyses. In this article, we first thoroughly examine the sources of residual autocorrelation in multiband task fMRI. We find that residual autocorrelation varies spatially throughout the cortex and is affected by the task, the acquisition method, modeling choices, and individual differences. Second, we evaluate the ability of different AR-based prewhitening strategies to effectively mitigate autocorrelation and control false positives. We find that allowing the prewhitening filter to vary spatially is the most important factor for successful prewhitening, even more so than increasing AR model order. To overcome the computational challenge associated with spatially variable prewhitening, we developed a computationally efficient R implementation based on parallelization and fast C++ backend code. This implementation is included in the open source R package BayesfMRI.
△ Less
Submitted 22 September, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Longitudinal surface-based spatial Bayesian GLM reveals complex trajectories of motor neurodegeneration in ALS
Authors:
Amanda F. Mejia,
Vincent Koppelmans,
Laura Jelsone-Swain,
Sanjay Kalra,
Robert C. Welsh
Abstract:
Longitudinal fMRI datasets hold great promise for the study of neurodegenerative diseases, but realizing their potential depends on extracting accurate fMRI-based brain measures in individuals over time. This is especially true for rare, heterogeneous and/or rapidly progressing diseases, which often involve small samples whose functional features may vary dramatically across subjects and over time…
▽ More
Longitudinal fMRI datasets hold great promise for the study of neurodegenerative diseases, but realizing their potential depends on extracting accurate fMRI-based brain measures in individuals over time. This is especially true for rare, heterogeneous and/or rapidly progressing diseases, which often involve small samples whose functional features may vary dramatically across subjects and over time, making traditional group-difference analyses of limited utility. One such disease is ALS, which results in extreme motor function loss and eventual death. Here, we analyze a rich longitudinal dataset containing 190 motor task fMRI scans from 16 ALS patients and 22 age-matched HCs. We propose a novel longitudinal extension to our cortical surface-based spatial Bayesian GLM, which has high power and precision to detect activations in individuals. Using a series of longitudinal mixed-effects models to subsequently study the relationship between activation and disease progression, we observe an inverted U-shaped trajectory: at relatively mild disability we observe enlarging activations, while at higher disability we observe severely diminished activation, reflecting progression toward complete motor function loss. We observe distinct trajectories depending on clinical progression rate, with faster progressors exhibiting more extreme hyper-activation and subsequent hypo-activation. These differential trajectories suggest that initial hyper-activation is likely attributable to loss of inhibitory neurons. By contrast, earlier studies employing more limited sampling designs and using traditional group-difference analysis approaches were only able to observe the initial hyper-activation, which was assumed to be due to a compensatory process. This study provides a first example of how surface-based spatial Bayesian modeling furthers scientific understanding of neurodegenerative disease.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Spatial Bayesian GLM on the cortical surface produces reliable task activations in individuals and groups
Authors:
Daniel Spencer,
Yu,
Yue,
David Bolin,
Sarah Ryan,
Amanda F. Mejia
Abstract:
The general linear model (GLM) is a widely popular and convenient tool for estimating the functional brain response and identifying areas of significant activation during a task or stimulus. However, the classical GLM is based on a massive univariate approach that does not explicitly leverage the similarity of activation patterns among neighboring brain locations. As a result, it tends to produce…
▽ More
The general linear model (GLM) is a widely popular and convenient tool for estimating the functional brain response and identifying areas of significant activation during a task or stimulus. However, the classical GLM is based on a massive univariate approach that does not explicitly leverage the similarity of activation patterns among neighboring brain locations. As a result, it tends to produce noisy estimates and be underpowered to detect significant activations, particularly in individual subjects and small groups. A recent alternative, a cortical surface-based spatial Bayesian GLM, leverages spatial dependencies among neighboring cortical vertices to produce more accurate estimates and areas of functional activation. The spatial Bayesian GLM can be applied to individual and group-level analysis. In this study, we assess the reliability and power of individual and group-average measures of task activation produced via the surface-based spatial Bayesian GLM. We analyze motor task data from 45 subjects in the Human Connectome Project (HCP) and HCP Retest datasets. We also extend the model to multi-run analysis and employ subject-specific cortical surfaces rather than surfaces inflated to a sphere for more accurate distance-based modeling. Results show that the surface-based spatial Bayesian GLM produces highly reliable activations in individual subjects and is powerful enough to detect trait-like functional topologies. Additionally, spatial Bayesian modeling enhances reliability of group-level analysis even in moderately sized samples (n=45). The power of the spatial Bayesian GLM to detect activations above a scientifically meaningful effect size is nearly invariant to sample size, exhibiting high power even in small samples (n=10). The spatial Bayesian GLM is computationally efficient in individuals and groups and is convenient to implement with the open-source BayesfMRI R package.
△ Less
Submitted 27 October, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
A spatial template independent component analysis model for subject-level brain network estimation and inference
Authors:
Amanda F. Mejia,
David Bolin,
Yu Ryan Yue,
Jiongran Wang,
Brian S. Caffo,
Mary Beth Nebel
Abstract:
Independent component analysis is commonly applied to functional magnetic resonance imaging (fMRI) data to extract independent components (ICs) representing functional brain networks. While ICA produces reliable group-level estimates, single-subject ICA often produces noisy results. Template ICA (tICA) is a hierarchical ICA model using empirical population priors to produce reliable subject-level…
▽ More
Independent component analysis is commonly applied to functional magnetic resonance imaging (fMRI) data to extract independent components (ICs) representing functional brain networks. While ICA produces reliable group-level estimates, single-subject ICA often produces noisy results. Template ICA (tICA) is a hierarchical ICA model using empirical population priors to produce reliable subject-level IC estimates. However, this and other hierarchical ICA models assume unrealistically that subject effects are spatially independent. Here, we propose spatial template ICA (stICA), which incorporates spatial process priors into tICA. This results in greater estimation efficiency of ICs and subject effects. Additionally, the joint posterior distribution can be used to identify engaged areas using an excursions set approach. By leveraging spatial dependencies and avoiding massive multiple comparisons, stICA has high power to detect true effects. We derive an efficient expectation-maximization algorithm to obtain maximum likelihood estimates of the model parameters and posterior moments of the latent fields. Based on analysis of simulated data and fMRI data from the Human Connectome Project, we find that stICA produces estimates that are more accurate and reliable than benchmark approaches, and identifies larger and more reliable areas of engagement. The algorithm is quite tractable, achieving convergence within 7 hours in our fMRI analysis.
△ Less
Submitted 4 June, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.
-
Template Independent Component Analysis: Targeted and Reliable Estimation of Subject-level Brain Networks using Big Data Population Priors
Authors:
Amanda F. Mejia,
Mary Beth Nebel,
Yikai Wang,
Brian S. Caffo,
Ying Guo
Abstract:
Large brain imaging databases contain a wealth of information on brain organization in the populations they target, and on individual variability. While such databases have been used to study group-level features of populations directly, they are currently underutilized as a resource to inform single-subject analysis. Here, we propose leveraging the information contained in large functional magnet…
▽ More
Large brain imaging databases contain a wealth of information on brain organization in the populations they target, and on individual variability. While such databases have been used to study group-level features of populations directly, they are currently underutilized as a resource to inform single-subject analysis. Here, we propose leveraging the information contained in large functional magnetic resonance imaging (fMRI) databases by establishing population priors to employ in an empirical Bayesian framework. We focus on estimation of brain networks as source signals in independent component analysis (ICA). We formulate a hierarchical "template" ICA model where source signals---including known population brain networks and subject-specific signals---are represented as latent variables. For estimation, we derive an expectation maximization (EM) algorithm having an explicit solution. However, as this solution is computationally intractable, we also consider an approximate subspace algorithm and a faster two-stage approach. Through extensive simulation studies, we assess performance of both methods and compare with dual regression, a popular but ad-hoc method. The two proposed algorithms have similar performance, and both dramatically outperform dual regression. We also conduct a reliability study utilizing the Human Connectome Project and find that template ICA achieves substantially better performance than dual regression, achieving 75-250% higher intra-subject reliability.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Effects of Scan Length and Shrinkage on Reliability of Resting-State Functional Connectivity in the Human Connectome Project
Authors:
Amanda F. Mejia,
Mary Beth Nebel,
Anita D. Barber,
Ann S. Choe,
Martin A. Lindquist
Abstract:
In this paper, we use data from the Human Connectome Project (N=461) to investigate the effect of scan length on reliability of resting-state functional connectivity (rsFC) estimates produced from resting-state functional magnetic resonance imaging (rsfMRI). Additionally, we study the benefits of empirical Bayes shrinkage, in which subject-level estimates borrow strength from the population averag…
▽ More
In this paper, we use data from the Human Connectome Project (N=461) to investigate the effect of scan length on reliability of resting-state functional connectivity (rsFC) estimates produced from resting-state functional magnetic resonance imaging (rsfMRI). Additionally, we study the benefits of empirical Bayes shrinkage, in which subject-level estimates borrow strength from the population average by trading a small increase in bias for a greater reduction in variance. For each subject, we compute raw and shrinkage estimates of rsFC between 300 regions identified through independent components analysis (ICA) based on rsfMRI scans varying from 3 to 30 minutes in length. The time course for each region is determined using dual regression, and rsFC is estimated as the Pearson correlation between each pair of time courses. Shrinkage estimates for each subject are computed as a weighted average between the raw subject-level estimate and the population average estimate, where the weight is determined for each connection by the relationship of within-subject variance to between-subject variance. We find that shrinkage estimates exhibit greater reliability than raw estimates for most connections, with 30-40% improvement using scans less than 10 minutes in length and 10-20% improvement using scans of 20-30 minutes. We also observe significant spatial variability in reliability of both raw and shrinkage estimates, with connections within the default mode and motor networks exhibiting the greatest reliability and between-network connections exhibiting the poorest reliability. We conclude that the scan length required for reliable estimation of rsFC depends on the specific connections of interest, and shrinkage can be used to increase reliability of rsFC, even when produced from long, high-quality rsfMRI scans.
△ Less
Submitted 19 June, 2016;
originally announced June 2016.
-
PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data
Authors:
Amanda F. Mejia,
Mary Beth Nebel,
Ani Eloyan,
Brian Caffo,
Martin A. Lindquist
Abstract:
Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing---primarily through…
▽ More
Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing---primarily through large, publicly available grassroots datasets---automated quality control and outlier detection methods are greatly needed. We propose PCA leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using ICA.
△ Less
Submitted 21 October, 2016; v1 submitted 2 September, 2015;
originally announced September 2015.
-
Improving Reliability of Subject-Level Resting-State fMRI Parcellation with Shrinkage Estimators
Authors:
Amanda F. Mejia,
Mary Beth Nebel,
Haochang Shou,
Ciprian M. Crainiceanu,
James J. Pekar,
Stewart Mostofsky,
Brian Caffo,
Martin A. Lindquist
Abstract:
A recent interest in resting state functional magnetic resonance imaging (rsfMRI) lies in subdividing the human brain into anatomically and functionally distinct regions of interest. For example, brain parcellation is often used for defining the network nodes in connectivity studies. While inference has traditionally been performed on group-level data, there is a growing interest in parcellating s…
▽ More
A recent interest in resting state functional magnetic resonance imaging (rsfMRI) lies in subdividing the human brain into anatomically and functionally distinct regions of interest. For example, brain parcellation is often used for defining the network nodes in connectivity studies. While inference has traditionally been performed on group-level data, there is a growing interest in parcellating single subject data. However, this is difficult due to the low signal-to-noise ratio of rsfMRI data, combined with typically short scan lengths. A large number of brain parcellation approaches employ clustering, which begins with a measure of similarity or distance between voxels. The goal of this work is to improve the reproducibility of single-subject parcellation using shrinkage estimators of such measures, allowing the noisy subject-specific estimator to "borrow strength" in a principled manner from a larger population of subjects. We present several empirical Bayes shrinkage estimators and outline methods for shrinkage when multiple scans are not available for each subject. We perform shrinkage on raw intervoxel correlation estimates and use both raw and shrinkage estimates to produce parcellations by performing clustering on the voxels. Our proposed method is agnostic to the choice of clustering method and can be used as a pre-processing step for any clustering algorithm. Using two datasets---a simulated dataset where the true parcellation is known and is subject-specific and a test-retest dataset consisting of two 7-minute rsfMRI scans from 20 subjects---we show that parcellations produced from shrinkage correlation estimates have higher reliability and validity than those produced from raw estimates. Application to test-retest data shows that using shrinkage estimators increases the reproducibility of subject-specific parcellations of the motor cortex by up to 30%.
△ Less
Submitted 28 October, 2015; v1 submitted 18 September, 2014;
originally announced September 2014.