-
Genome wide signals of pervasive positive selection in human evolution
Authors:
David Enard,
Philipp W. Messer,
Dmitri Petrov
Abstract:
The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide…
▽ More
The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked recurrent deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1,000 Genomes project (Abecasis et al. 2012) and detect signatures of pervasive positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as the presence of unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to show that the observed signatures require a high rate of strongly adaptive substitutions in the vicinity of the amino acid changes. We further demonstrate that the observed signatures of positive selection correlate more strongly with the presence of regulatory sequences, as predicted by ENCODE (Gerstein et al. 2012), than the positions of amino acid substitutions. Our results establish that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson (King and Wilson 1975) that adaptive divergence is primarily driven by regulatory changes.
△ Less
Submitted 22 August, 2013;
originally announced August 2013.
-
Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps
Authors:
Nandita R. Garud,
Philipp W. Messer,
Erkan O. Buzbas,
Dmitri A. Petrov
Abstract:
Rapid adaptation has been observed in numerous organisms in response to selective pressures, such as the application of pesticides and the presence of pathogens. When rapid adaptation is driven by rare alleles from the standing genetic variation or by a high population rate of de novo adaptive mutation, positive selection should commonly generate soft rather that hard selective sweeps. In a soft s…
▽ More
Rapid adaptation has been observed in numerous organisms in response to selective pressures, such as the application of pesticides and the presence of pathogens. When rapid adaptation is driven by rare alleles from the standing genetic variation or by a high population rate of de novo adaptive mutation, positive selection should commonly generate soft rather that hard selective sweeps. In a soft sweep, multiple adaptive haplotypes sweep through the population simultaneously, in contrast to hard sweeps in which only a single adaptive haplotype rises to high frequency. Current statistical methods were not designed to detect soft sweeps, and are therefore likely to miss these possibly numerous adaptive events. Here, we develop a statistical test (H12) based on haplotype homozygosity that is capable of detecting both hard and soft sweeps with similar power. We use H12 to identify multiple genomic regions that have undergone recent and strong adaptation in a population sample of fully sequenced Drosophila melanogaster strains from the Drosophila Genetic Reference Panel (DGRP). Visual inspection of the top 50 peaks revealed that multiple haplotypes are at high frequency, consistent with signatures of soft sweep. We developed a second statistic (H2/H1) that is sensitive to signatures common to soft sweeps but not hard sweeps, in order to determine whether sweeps detected by H12 can be more easily generated by hard versus soft sweeps. Surprisingly, we find that the H12 and H2/H1 values for all top 50 peaks are more easily generated by soft sweeps than hard sweeps under several evolutionary scenarios.
△ Less
Submitted 2 November, 2014; v1 submitted 4 March, 2013;
originally announced March 2013.
-
Strong Purifying Selection at Synonymous Sites in D. melanogaster
Authors:
David S. Lawrie,
Philipp W. Messer,
Ruth Hershberg,
Dmitri A. Petrov
Abstract:
Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in D. melanogaster evolve under very strong selective constraint whil…
▽ More
Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in D. melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.
△ Less
Submitted 15 January, 2013;
originally announced January 2013.
-
SLiM: Simulating Evolution with Selection and Linkage
Authors:
Philipp W. Messer
Abstract:
SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene and chromosomal structure, and user-defined recombination maps.
SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene and chromosomal structure, and user-defined recombination maps.
△ Less
Submitted 14 January, 2013;
originally announced January 2013.
-
The McDonald-Kreitman Test and its Extensions under Frequent Adaptation: Problems and Solutions
Authors:
Philipp W. Messer,
Dmitri A. Petrov
Abstract:
Population genomic studies have shown that genetic draft and background selection can profoundly affect the genome-wide patterns of molecular variation. We performed forward simulations under realistic gene-structure and selection scenarios to investigate whether such linkage effects impinge on the ability of the McDonald-Kreitman (MK) test to infer the rate of positive selection (α) from polymorp…
▽ More
Population genomic studies have shown that genetic draft and background selection can profoundly affect the genome-wide patterns of molecular variation. We performed forward simulations under realistic gene-structure and selection scenarios to investigate whether such linkage effects impinge on the ability of the McDonald-Kreitman (MK) test to infer the rate of positive selection (α) from polymorphism and divergence data. We find that in the presence of slightly deleterious mutations, MK estimates of α severely underestimate the true rate of adaptation even if all polymorphisms with population frequencies under 50% are excluded. Furthermore, already under intermediate rates of adaptation, genetic draft substantially distorts the site frequency spectra at neutral and functional sites from the expectations under mutation-selection-drift balance. MK-type approaches that first infer demography from synonymous sites and then use the inferred demography to correct the estimation of α obtain almost the correct α in our simulations. However, these approaches typically infer a severe past population expansion although there was no such expansion in the simulations, casting doubt on the accuracy of methods that infer demography from synonymous polymorphism data. We suggest a simple asymptotic extension of the MK test that should yield accurate estimates of α even in the presence of linkage effects.
△ Less
Submitted 31 October, 2012;
originally announced November 2012.
-
Estimating the Strength of Selective Sweeps from Deep Population Diversity Data
Authors:
Philipp W. Messer,
Richard A. Neher
Abstract:
Selective sweeps are typically associated with a local reduction of genetic diversity around the adaptive site. However, selective sweeps can also quickly carry neutral mutations to observable population frequencies if they arise early in a sweep and hitchhike with the adaptive allele. We show that the interplay between mutation and exponential amplification through hitchhiking results in a charac…
▽ More
Selective sweeps are typically associated with a local reduction of genetic diversity around the adaptive site. However, selective sweeps can also quickly carry neutral mutations to observable population frequencies if they arise early in a sweep and hitchhike with the adaptive allele. We show that the interplay between mutation and exponential amplification through hitchhiking results in a characteristic frequency spectrum of the resulting novel haplotype variation that depends only on the ratio of the mutation rate and the selection coefficient of the sweep. Based on this result, we develop an estimator for the selection coefficient driving a sweep. Since this estimator utilizes the novel variation arising from mutations during a sweep, it does not rely on preexisting variation and can also be applied to loci that lack recombination. Compared with standard approaches that infer selection coefficients from the size of dips in genetic diversity around the adaptive site, our estimator requires much shorter sequences but sampled at high population depth in order to capture low-frequency variants; given such data, it consistently outperforms standard approaches. We investigate analytically and numerically how the accuracy of our estimator is affected by the decay of the sweep pattern over time as a consequence of random genetic drift and discuss potential effects of recombination, soft sweeps, and demography. As an example for its use, we apply our estimator to deep sequencing data from HIV populations.
△ Less
Submitted 28 June, 2012;
originally announced June 2012.
-
Universality of Long-Range Correlations in Expansion-Randomization Systems
Authors:
Philipp W. Messer,
Michael Lassig,
Peter F. Arndt
Abstract:
We study the stochastic dynamics of sequences evolving by single site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly…
▽ More
We study the stochastic dynamics of sequences evolving by single site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent $χ$ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of $χ$, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.
△ Less
Submitted 22 September, 2005;
originally announced September 2005.
-
A Solvable Sequence Evolution Model and Genomic Correlations
Authors:
Philipp W. Messer,
Peter F. Arndt,
Michael Lässig
Abstract:
We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing, i.e., the combined rates of duplications and insertions are higher than the deletion…
▽ More
We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing, i.e., the combined rates of duplications and insertions are higher than the deletion rate. For constant sequence length, on the other hand, all initial correlations decay exponentially. These results are obtained analytically and by simulations. They are compared with the long-range correlations observed in genomic DNA, and the implications for genome evolution are discussed.
△ Less
Submitted 9 January, 2005;
originally announced January 2005.