Search | arXiv e-print repository

Algorithms for Large-scale Whole Genome Association Analysis

Authors: Elmar Peise, Diego Fabregat, Yurii Aulchenko, Paolo Bientinesi

Abstract: In order to associate complex traits with genetic polymorphisms, genome-wide association studies process huge datasets involving tens of thousands of individuals genotyped for millions of polymorphisms. When handling these datasets, which exceed the main memory of contemporary computers, one faces two distinct challenges: 1) Millions of polymorphisms come at the cost of hundreds of Gigabytes of ge… ▽ More In order to associate complex traits with genetic polymorphisms, genome-wide association studies process huge datasets involving tens of thousands of individuals genotyped for millions of polymorphisms. When handling these datasets, which exceed the main memory of contemporary computers, one faces two distinct challenges: 1) Millions of polymorphisms come at the cost of hundreds of Gigabytes of genotype data, which can only be kept in secondary storage; 2) the relatedness of the test population is represented by a covariance matrix, which, for large populations, can only fit in the combined main memory of a distributed architecture. In this paper, we present solutions for both challenges: The genotype data is streamed from and to secondary storage using a double buffering technique, while the covariance matrix is kept across the main memory of a distributed memory system. We show that these methods sustain high-performance and allow the analysis of enormous dataset △ Less

Submitted 8 April, 2013; originally announced April 2013.

Report number: AICES-2013/04-2

arXiv:1210.7325 [pdf, other]

Solving Sequences of Generalized Least-Squares Problems on Multi-threaded Architectures

Authors: Diego Fabregat-Traver, Yurii Aulchenko, Paolo Bientinesi

Abstract: Generalized linear mixed-effects models in the context of genome-wide association studies (GWAS) represent a formidable computational challenge: the solution of millions of correlated generalized least-squares problems, and the processing of terabytes of data. We present high performance in-core and out-of-core shared-memory algorithms for GWAS: By taking advantage of domain-specific knowledge, ex… ▽ More Generalized linear mixed-effects models in the context of genome-wide association studies (GWAS) represent a formidable computational challenge: the solution of millions of correlated generalized least-squares problems, and the processing of terabytes of data. We present high performance in-core and out-of-core shared-memory algorithms for GWAS: By taking advantage of domain-specific knowledge, exploiting multi-core parallelism, and handling data efficiently, our algorithms attain unequalled performance. When compared to GenABEL, one of the most widely used libraries for GWAS, on a 12-core processor we obtain 50-fold speedups. As a consequence, our routines enable genome studies of unprecedented size. △ Less

Submitted 27 October, 2012; originally announced October 2012.

arXiv:1207.2169 [pdf, other]

High-throughput Genome-wide Association Analysis for Single and Multiple Phenotypes

Authors: Diego Fabregat-Traver, Yurii S. Aulchenko, Paolo Bientinesi

Abstract: The variance component tests used in genomewide association studies of thousands of individuals become computationally exhaustive when multiple traits are analysed in the context of omics studies. We introduce two high-throughput algorithms -- CLAK-CHOL and CLAK-EIG -- for single and multiple phenotype genome-wide association studies (GWAS). The algorithms, generated with the help of an expert sys… ▽ More The variance component tests used in genomewide association studies of thousands of individuals become computationally exhaustive when multiple traits are analysed in the context of omics studies. We introduce two high-throughput algorithms -- CLAK-CHOL and CLAK-EIG -- for single and multiple phenotype genome-wide association studies (GWAS). The algorithms, generated with the help of an expert system, reduce the computational complexity to the point that thousands of traits can be analyzed for association with millions of polymorphisms in a course of days on a standard workstation. By taking advantage of problem specific knowledge, CLAK-CHOL and CLAK-EIG significantly outperform the current state-of-the-art tools in both single and multiple trait analysis. △ Less

Submitted 10 November, 2012; v1 submitted 9 July, 2012; originally announced July 2012.

Showing 1–3 of 3 results for author: Aulchenko, Y