Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Keys, Kevin L.; Chen, Gary K.; Lange, Kenneth

doi:10.1002/gepi.22068

Statistics > Machine Learning

arXiv:1608.01398 (stat)

[Submitted on 4 Aug 2016 (v1), last revised 25 Jul 2017 (this version, v3)]

Title:Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Authors:Kevin L. Keys, Gary K. Chen, Kenneth Lange

View PDF

Abstract:A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researchers have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with LASSO or MCP penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. This paper introduces the iterative hard thresholding (IHT) algorithm to the GWAS analysis of continuous traits. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. We evaluate IHT performance on both simulated and real GWAS data and conclude that it reduces false positive and false negative rates while remaining competitive in computational time with penalized regression. Source code is freely available at this https URL.

Comments:	13 pages, 1 figure, 4 tables
Subjects:	Machine Learning (stat.ML)
Cite as:	arXiv:1608.01398 [stat.ML]
	(or arXiv:1608.01398v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1608.01398
Journal reference:	Genetic Epidemiology 2017:41(8), 756--768
Related DOI:	https://doi.org/10.1002/gepi.22068

Submission history

From: Kevin Keys [view email]
[v1] Thu, 4 Aug 2016 00:05:24 UTC (24 KB)
[v2] Wed, 16 Nov 2016 01:07:56 UTC (24 KB)
[v3] Tue, 25 Jul 2017 00:49:55 UTC (40 KB)

Statistics > Machine Learning

Title:Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators