Search | arXiv e-print repository

doi 10.1126/science.abm7530

Getting Genetic Ancestry Right for Science and Society

Authors: Anna C. F. Lewis, Santiago J. Molina, Paul S Appelbaum, Bege Dauda, Anna Di Rienzo, Agustin Fuentes, Stephanie M. Fullerton, Nanibaa' A. Garrison, Nayanika Ghosh, Evelynn M. Hammonds, David S. Jones, Eimear E. Kenny, Peter Kraft, Sandra S. -J. Lee, Madelyn Mauro, John Novembre, Aaron Panofsky, Mashaal Sohail, Benjamin M. Neale, Danielle S. Allen

Abstract: There is a scientific and ethical imperative to embrace a multidimensional, continuous view of ancestry and move away from continental ancestry categories There is a scientific and ethical imperative to embrace a multidimensional, continuous view of ancestry and move away from continental ancestry categories △ Less

Submitted 14 October, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:1410.5313 [pdf]

doi 10.1534/g3.116.027581

Conflation of short identity-by-descent segments bias their inferred length distribution

Authors: Charleston W. K. Chiang, Peter Ralph, John Novembre

Abstract: Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a popu… ▽ More Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. In a common definition, two haplotypes are said to contain an IBD segment if they share a segment that is inherited from a recent shared common ancestor without intervening recombination. Long IBD segments (> 1cM) can be efficiently detected by a number of algorithms using high-density SNP array data from a population sample. However, these approaches detect IBD based on contiguous segments of identity-by-state, and such segments may exist due to the conflation of smaller, nearby IBD segments. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long are results of conflations of two or more shorter segments, under demographic scenarios typical for modern humans. This biases the inferred IBD segment length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). As an example of how this can negatively affect downstream analyses, we present and analyze a novel estimator of the de novo mutation rate using IBD segments, and demonstrate that the biased length distribution of the IBD segments due to conflation can lead to inflated estimates if the conflation is not modeled. Understanding the conflation effect in detail will make its correction in future methods more tractable. △ Less

Submitted 17 August, 2015; v1 submitted 20 October, 2014; originally announced October 2014.

Journal ref: G3 May 1, 2016 vol. 6 no. 5 1287-1296

arXiv:1310.3234 [pdf, other]

forqs: Forward-in-time Simulation of Recombination, Quantitative Traits, and Selection

Authors: Darren Kessner, John Novembre

Abstract: forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binar… ▽ More forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binary executables for Linux, OSX, and Windows are freely available under a permissive BSD license. △ Less

Submitted 11 October, 2013; originally announced October 2013.

Comments: preprint include Supplementary Information. https://bitbucket.org/dkessner/forqs

arXiv:1305.7390 [pdf]

Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs

Authors: Adam H. Freedman, Rena M. Schweizer, Ilan Gronau, Eunjung Han, Diego Ortega-Del Vecchyo, Pedro M. Silva, Marco Galaverni, Zhenxin Fan, Peter Marx, Belen Lorente-Galdos, Holly Beale, Oscar Ramirez, Farhad Hormozdiari, Can Alkan, Carles Vilà, Kevin Squire, Eli Geffen, Josip Kusak, Adam R. Boyko, Heidi G. Parker, Clarence Lee, Vasisht Tadigotla, Adam Siepel, Carlos D. Bustamante, Timothy T. Harkins , et al. (5 additional authors not shown)

Abstract: To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population… ▽ More To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we analyzed novel high-quality genome sequences of three gray wolves, one from each of three putative centers of dog domestication, two ancient dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. We find dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow, which confounds previous inferences of dog origins. In dogs, the domestication bottleneck was severe involving a 17 to 49-fold reduction in population size, a much stronger bottleneck than estimated previously from less intensive sequencing efforts. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was far larger than represented by modern wolf populations. Conditional on mutation rate, we narrow the plausible range for the date of initial dog domestication to an interval from 11 to 16 thousand years ago. This period predates the rise of agriculture, implying that the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that surprisingly, none of the extant wolf lineages from putative domestication centers are more closely related to dogs, and the sampled wolves instead form a sister monophyletic clade. This result, in combination with our finding of dog-wolf admixture during the process of domestication, suggests a re-evaluation of past hypotheses of dog origin is necessary. Finally, we also detect signatures of selection, including evidence for selection on genes implicated in morphology, metabolism, and neural development. Uniquely, we find support for selective sweeps at regulatory sites suggesting gene regulatory changes played a critical role in dog domestication. △ Less

Submitted 4 June, 2013; v1 submitted 31 May, 2013; originally announced May 2013.

Comments: 24 pages, 5 figures. To download the Supporting Information file, use the following link: https://www.dropbox.com/s/2yoytspv1iods7s/Freedman_etal_SupportingInfo_arxiv.pdf

arXiv:1209.4128 [pdf, other]

doi 10.1093/molbev/mst016

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

Authors: Darren Kessner, Tom Turner, John Novembre

Abstract: DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strain… ▽ More DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strains in a blood sample). We present an expectation-maximization (EM) algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different strains within a metagenomics sample. Our method outperforms existing methods based on single- site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool. △ Less

Submitted 18 September, 2012; originally announced September 2012.

Comments: 23 pages, 8 figures

Showing 1–5 of 5 results for author: Novembre, J