-
Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes data
Authors:
Nicolas Duforet-Frebourg,
Keurcien Luu,
Guillaume Laval,
Eric Bazin,
Michael G. B. Blum
Abstract:
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the co…
▽ More
To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) after removal of recently admixed individuals resulting in 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3X). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.
△ Less
Submitted 18 November, 2015; v1 submitted 8 April, 2015;
originally announced April 2015.
-
Genome scans for detecting footprints of local adaptation using a Bayesian factor model
Authors:
N. Duforet-Frebourg,
E. Bazin,
M. G. B. Blum
Abstract:
A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genotyping numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. Howev…
▽ More
A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genotyping numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as $F_{ST}$. However they are important caveats with approaches related to $F_{ST}$ because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. As outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that the factor model can achieve a 2-fold or more reduction of false discovery rate compared to the software BayeScan or compared to a $F_{ST}$ approach. We analyze the data of the Human Genome Diversity Panel to provide an example of how factor models can be used to detect local adaptation with a large number of SNPs. The Bayesian factor model is implemented in the open-source PCAdapt software.
△ Less
Submitted 29 July, 2014; v1 submitted 21 February, 2014;
originally announced February 2014.
-
Non-stationary patterns of isolation-by-distance: inferring measures of local genetic differentiation with Bayesian kriging
Authors:
Nicolas Duforet-Frebourg,
Michael G. B. Blum
Abstract:
Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density c…
▽ More
Patterns of isolation-by-distance arise when population differentiation increases with increasing geographic distances. Patterns of isolation-by-distance are usually caused by local spatial dispersal, which explains why differences of allele frequencies between populations accumulate with distance. However, spatial variations of demographic parameters such as migration rate or population density can generate non-stationary patterns of isolation-by-distance where the rate at which genetic differentiation accumulates varies across space. To characterize non-stationary patterns of isolation-by-distance, we infer local genetic differentiation based on Bayesian kriging. Local genetic differentiation for a sampled population is defined as the average genetic differentiation between the sampled population and fictive neighboring populations. To avoid defining populations in advance, the method can also be applied at the scale of individuals making it relevant for landscape genetics. Inference of local genetic differentiation relies on a matrix of pairwise similarity or dissimilarity between populations or individuals such as matrices of FST between pairs of populations. Simulation studies show that maps of local genetic differentiation can reveal barriers to gene flow but also other patterns such as continuous variations of gene flow across habitat. The potential of the method is illustrated with 2 data sets: genome-wide SNP data for human Swedish populations and AFLP markers for alpine plant species. The software LocalDiff implementing the method is available at http://membres-timc.imag.fr/Michael.Blum/LocalDiff.html
△ Less
Submitted 7 January, 2014; v1 submitted 24 September, 2012;
originally announced September 2012.
-
A mean-field analysis of community structure in social and kin networks
Authors:
E. Durand,
M. G. B Blum,
O. Francois
Abstract:
We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In p…
▽ More
We provide a mean-field analysis of community structure of social and biological networks assuming that actors are able to evaluate some tree-derived distance to the other actors and tend to aggregate with the less distant. We show that such networks have small components, and give exact descriptions for the probability distribution of a typical community size and the number of communities. In particular, we show that the probability distribution of the community size is well-approximated by a power-law distribution with exponent two. We illustrate the robustness of the mean-field analysis by comparing its predictions on previously studied social networks and biological data.
△ Less
Submitted 13 April, 2006;
originally announced April 2006.