Can artificial neural networks supplant the polygene risk score for risk prediction of complex disorders given very large sample sizes?
Authors:
Carlos Pinto,
Michael Gill,
Schizophrenia Working Group of the Psychiatric Genomics Consortium,
Elizabeth A. Heron
Abstract:
Genome-wide association studies (GWAS) provide a means of examining the common genetic variation underlying a range of traits and disorders. In addition, it is hoped that GWAS may provide a means of differentiating affected from unaffected individuals. This has potential applications in the area of risk prediction. Current attempts to address this problem focus on using the polygene risk score (PR…
▽ More
Genome-wide association studies (GWAS) provide a means of examining the common genetic variation underlying a range of traits and disorders. In addition, it is hoped that GWAS may provide a means of differentiating affected from unaffected individuals. This has potential applications in the area of risk prediction. Current attempts to address this problem focus on using the polygene risk score (PRS) to predict case-control status on the basis of GWAS data. However this approach has so far had limited success for complex traits such as schizophrenia (SZ). This is essentially a classification problem. Artificial neural networks (ANNs) have been shown in recent years to be highly effective in such applications. Here we apply an ANN to the problem of distinguishing SZ patients from unaffected controls. We compare the effectiveness of the ANN with the PRS in classifying individuals by case-control status based only on genetic data from a GWAS. We use the schizophrenia dataset from the Psychiatric Genomics Consortium (PGC) for this study. Our analysis indicates that the ANN is more sensitive to sample size than the PRS. As larger and larger sample sizes become available, we suggest that ANNs are a promising alternative to the PRS for classification and risk prediction for complex genetic disorders.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
Genetic Classification of Populations using Supervised Learning
Authors:
M. Bridges,
E. A. Heron,
C. O'Dushlaine,
R. Segurado,
The International Schizophrenia Consortium,
D. Morris,
A. Corvin,
M. Gill,
C. Pinto
Abstract:
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case--control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance i…
▽ More
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case--control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed \emph{unsupervised}. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.
In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.
△ Less
Submitted 16 December, 2010;
originally announced December 2010.