-
Hierarchical Bayesian Regression for Multi-Site Normative Modeling of Neuroimaging Data
Authors:
Seyed Mostafa Kia,
Hester Huijsdens,
Richard Dinga,
Thomas Wolfers,
Maarten Mennes,
Ole A. Andreassen,
Lars T. Westlye,
Christian F. Beckmann,
Andre F. Marquand
Abstract:
Clinical neuroimaging has recently witnessed explosive growth in data availability which brings studying heterogeneity in clinical cohorts to the spotlight. Normative modeling is an emerging statistical tool for achieving this objective. However, its application remains technically challenging due to difficulties in properly dealing with nuisance variation, for example due to variability in image…
▽ More
Clinical neuroimaging has recently witnessed explosive growth in data availability which brings studying heterogeneity in clinical cohorts to the spotlight. Normative modeling is an emerging statistical tool for achieving this objective. However, its application remains technically challenging due to difficulties in properly dealing with nuisance variation, for example due to variability in image acquisition devices. Here, in a fully probabilistic framework, we propose an application of hierarchical Bayesian regression (HBR) for multi-site normative modeling. Our experimental results confirm the superiority of HBR in deriving more accurate normative ranges on large multi-site neuroimaging data compared to widely used methods. This provides the possibility i) to learn the normative range of structural and functional brain measures on large multi-site data; ii) to recalibrate and reuse the learned model on local small data; therefore, HBR closes the technical loop for applying normative modeling as a medical tool for the diagnosis and prognosis of mental disorders.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Is the familywise error rate in genomics controlled by methods based on the effective number of independent tests?
Authors:
Kari Krizak Halle,
Srdjan Djurovic,
Ole Andreas Andreassen,
Mette Langaas
Abstract:
In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the con…
▽ More
In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the concept of estimating an effective number of independent tests. We compare these methods using simulated data and data from the TOP study, and show that the effective number of independent tests is not additive over blocks of independent genetic markers unless we assume a common value for the local significance level. We also show that the reviewed methods based on estimating the effective number of independent tests in general do not control the familywise error rate.
△ Less
Submitted 21 December, 2016; v1 submitted 14 December, 2016;
originally announced December 2016.
-
Efficient and powerful familywise error control in genome-wide association studies using generalized linear models
Authors:
K. K. Halle,
Ø. Bakke,
S. Djurovic,
A. Bye,
E. Ryeng,
U. Wisløff,
O. A. Andreassen,
M. Langaas
Abstract:
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure.…
▽ More
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure. A multivariate score statistic, which under the complete null hypothesis of no phenotype-genotype association asymptotically has a multivariate normal distribution with a covariance matrix that can be estimated from the data, is used to test a large number of genetic markers for association with the phenotype. We stress the importance of controlling the familywise error rate (FWER), and use the asymptotic distribution of the multivariate score test statistic to find a local significance level for the individual test. Using real data (from one study on schizophrenia and bipolar disorder and one on maximal oxygen uptake) and constructed correlated structures, we show that our method is a powerful alternative to the popular Bonferroni and Sidak methods. For GLMs without environmental covariates, we show that our method is an efficient alternative to permutation methods for multiple testing. Further, we show that if environmental covariates and genetic markers are uncorrelated, the estimated covariance matrix of the score test statistic can be approximated by the estimated correlation matrix for just the genetic markers. As byproducts of our method, an effective number of independent tests can be defined, and FWER-adjusted $p$-values can be calculated as an alternative to using a local significance level.
△ Less
Submitted 22 December, 2016; v1 submitted 18 March, 2016;
originally announced March 2016.