Pathways of Distinction Analysis: a new technique for multi-SNP analysis of GWAS data
Authors:
Rosemary Braun,
Kenneth Buetow
Abstract:
Genome-wide association studies have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases are unlikely to have a single causative gene. There is thus a pressing need f…
▽ More
Genome-wide association studies have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases are unlikely to have a single causative gene. There is thus a pressing need for multi-SNP analysis methods that can reveal system-level differences in cases and controls. Here, we present a novel multi-SNP GWAS analysis method called Pathways of Distinction Analysis (PoDA). The method uses GWAS data and known pathway-gene and gene-SNP associations to identify pathways that permit, ideally, the distinction of cases from controls. The technique is based upon the hypothesis that if a pathway is related to disease risk, cases will appear more similar to other cases than to controls for the SNPs associated with that pathway. By systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. Importantly, PoDA improves on existing single-SNP and SNP-set enrichment analyses in that it does not require the SNPs in a pathway to exhibit independent main effects. This permits PoDA to reveal pathways in which epistatic interactions drives risk. In this paper, we detail the PoDA method and apply it to two GWA studies: one of breast cancer, and the other of liver cancer. The results obtained strongly suggest that there exist pathway-wide genomic differences that contribute to disease susceptibility. PoDA thus provides an analytical tool that is complementary to existing techniques and has the power to enrich our understanding of disease genomics at the systems-level.
△ Less
Submitted 17 March, 2011; v1 submitted 21 December, 2010;
originally announced December 2010.
Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data
Authors:
Rosemary Braun,
William Rowe,
Carl Schaefer,
Jinghui Zhang,
Kenneth Buetow
Abstract:
Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this met…
▽ More
Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present an exploration of the strengths and limitations of that method. In addition to analytical investigations of the underlying assumptions, we use both real and simulated genotypes to test empirically the method's accuracy. The results reveal that, when used as a means by which to identify individuals as members of a population sample, the specificity is low in several circumstances. We find that the misclassifications stem from violations of assumptions that are crucial to the technique yet hard to control in practice, and we explore the feasibility of several methods to improve the sensitivity. Additionally, we find that the specificity may still be lower than expected even in ideal circumstances. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.
△ Less
Submitted 20 May, 2009; v1 submitted 9 February, 2009;
originally announced February 2009.