-
Getting Genetic Ancestry Right for Science and Society
Authors:
Anna C. F. Lewis,
Santiago J. Molina,
Paul S Appelbaum,
Bege Dauda,
Anna Di Rienzo,
Agustin Fuentes,
Stephanie M. Fullerton,
Nanibaa' A. Garrison,
Nayanika Ghosh,
Evelynn M. Hammonds,
David S. Jones,
Eimear E. Kenny,
Peter Kraft,
Sandra S. -J. Lee,
Madelyn Mauro,
John Novembre,
Aaron Panofsky,
Mashaal Sohail,
Benjamin M. Neale,
Danielle S. Allen
Abstract:
There is a scientific and ethical imperative to embrace a multidimensional, continuous view of ancestry and move away from continental ancestry categories
There is a scientific and ethical imperative to embrace a multidimensional, continuous view of ancestry and move away from continental ancestry categories
△ Less
Submitted 14 October, 2021; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Conflation of short identity-by-descent segments bias their inferred length distribution
Authors:
Charleston W. K. Chiang,
Peter Ralph,
John Novembre
Abstract:
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a popu…
▽ More
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample. However, these approaches detect IBD based on contiguous segments of identity-by-state, and such segments may exist due to the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long are results of conflations of two or more shorter segments, under demographic scenarios typical for modern humans. This biases the inferred IBD segment length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). As an example of how this can negatively affect downstream analyses, we present and analyze a novel estimator of the de novo mutation rate using IBD segments, and demonstrate that the biased length distribution of the IBD segments due to conflation can lead to inflated estimates if the conflation is not modeled. Understanding the conflation effect in detail will make its correction in future methods more tractable.
△ Less
Submitted 17 August, 2015; v1 submitted 20 October, 2014;
originally announced October 2014.
-
forqs: Forward-in-time Simulation of Recombination, Quantitative Traits, and Selection
Authors:
Darren Kessner,
John Novembre
Abstract:
forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binar…
▽ More
forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binary executables for Linux, OSX, and Windows are freely available under a permissive BSD license.
△ Less
Submitted 11 October, 2013;
originally announced October 2013.
-
Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs
Authors:
Adam H. Freedman,
Rena M. Schweizer,
Ilan Gronau,
Eunjung Han,
Diego Ortega-Del Vecchyo,
Pedro M. Silva,
Marco Galaverni,
Zhenxin Fan,
Peter Marx,
Belen Lorente-Galdos,
Holly Beale,
Oscar Ramirez,
Farhad Hormozdiari,
Can Alkan,
Carles VilĂ ,
Kevin Squire,
Eli Geffen,
Josip Kusak,
Adam R. Boyko,
Heidi G. Parker,
Clarence Lee,
Vasisht Tadigotla,
Adam Siepel,
Carlos D. Bustamante,
Timothy T. Harkins
, et al. (5 additional authors not shown)
Abstract:
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population…
▽ More
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow, which confounds previous inferences of dog origins. In dogs, the domestication bottleneck was severe involving a 17 to 49-fold reduction in population size, a much stronger bottleneck than estimated previously from less intensive sequencing efforts. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was far larger than represented by modern wolf populations. Conditional on mutation rate, we narrow the plausible range for the date of initial dog domestication to an interval from 11 to 16 thousand years ago. This period predates the rise of agriculture, implying that the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that surprisingly, none of the extant wolf lineages from putative domestication centers are more closely related to dogs, and the sampled wolves instead form a sister monophyletic clade. This result, in combination with our finding of dog-wolf admixture during the process of domestication, suggests a re-evaluation of past hypotheses of dog origin is necessary. Finally, we also detect signatures of selection, including evidence for selection on genes implicated in morphology, metabolism, and neural development. Uniquely, we find support for selective sweeps at regulatory sites suggesting gene regulatory changes played a critical role in dog domestication.
△ Less
Submitted 4 June, 2013; v1 submitted 31 May, 2013;
originally announced May 2013.
-
Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data
Authors:
Darren Kessner,
Tom Turner,
John Novembre
Abstract:
DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strain…
▽ More
DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strains in a blood sample). We present an expectation-maximization (EM) algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different strains within a metagenomics sample. Our method outperforms existing methods based on single- site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool.
△ Less
Submitted 18 September, 2012;
originally announced September 2012.