-
Computing the joint distribution of the total tree length across loci in populations with variable size
Authors:
Alexey Miroshnikov,
Matthias Steinrücken
Abstract:
In recent years, a number of methods have been developed to infer complex demographic histories, especially historical population size changes, from genomic sequence data. Coalescent Hidden Markov Models have proven to be particularly useful for this type of inference. Due to the Markovian structure of these models, an essential building block is the joint distribution of local genealogical trees,…
▽ More
In recent years, a number of methods have been developed to infer complex demographic histories, especially historical population size changes, from genomic sequence data. Coalescent Hidden Markov Models have proven to be particularly useful for this type of inference. Due to the Markovian structure of these models, an essential building block is the joint distribution of local genealogical trees, or statistics of these genealogies, at two neighboring loci in populations of variable size. Here, we present a novel method to compute the marginal and the joint distribution of the total length of the genealogical trees at two loci separated by at most one recombination event for samples of arbitrary size. To our knowledge, no method to compute these distributions has been presented in the literature to date. We show that they can be obtained from the solution of certain hyperbolic systems of partial differential equations. We present a numerical algorithm, based on the method of characteristics, that can be used to efficiently and accurately solve these systems and compute the marginal and the joint distributions. We demonstrate its utility to study the properties of the joint distribution. Our flexible method can be straightforwardly extended to handle an arbitrary fixed number of recombination events, to include the distributions of other statistics of the genealogies as well, and can also be applied in structured populations.
△ Less
Submitted 7 October, 2017; v1 submitted 28 September, 2016;
originally announced September 2016.
-
A novel spectral method for inferring general diploid selection from time series genetic data
Authors:
Matthias Steinrücken,
Anand Bhaskar,
Yun S. Song
Abstract:
The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the…
▽ More
The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data. Here, we develop a novel spectral algorithm to analytically and efficiently integrate over all possible frequency trajectories between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the population allele frequency space when numerically approximating requisite integrals. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and also apply it to analyze ancient DNA data from genetic loci associated with coat coloration in horses. In contrast to previous studies, our exploration of the full fitness parameter space reveals that a heterozygote advantage form of balancing selection may have been acting on these loci.
△ Less
Submitted 26 January, 2015; v1 submitted 3 October, 2013;
originally announced October 2013.
-
Analysis of DNA sequence variation within marine species using Beta-coalescents
Authors:
Matthias Steinrücken,
Matthias Birkner,
Jochen Blath
Abstract:
We apply recently developed inference methods based on general coalescent processes to DNA sequence data obtained from various marine species. Several of these species are believed to exhibit so-called shallow gene genealogies, potentially due to extreme reproductive behaviour, e.g. via Hedgecock's "reproduction sweepstakes". Besides the data analysis, in particular the inference of mutation rates…
▽ More
We apply recently developed inference methods based on general coalescent processes to DNA sequence data obtained from various marine species. Several of these species are believed to exhibit so-called shallow gene genealogies, potentially due to extreme reproductive behaviour, e.g. via Hedgecock's "reproduction sweepstakes". Besides the data analysis, in particular the inference of mutation rates and the estimation of the (real) time to the most recent common ancestor, we briefly address the question whether the genealogies might be adequately described by so-called Beta coalescents (as opposed to Kingman's coalescent), allowing multiple mergers of genealogies.
The choice of the underlying coalescent model for the genealogy has drastic implications for the estimation of the above quantities, in particular the real-time embedding of the genealogy.
△ Less
Submitted 4 November, 2012; v1 submitted 4 September, 2012;
originally announced September 2012.
-
An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection
Authors:
Matthias Steinrücken,
Y. X. Rachel Wang,
Yun S. Song
Abstract:
Characterizing time-evolution of allele frequencies in a population is a fundamental problem in population genetics. In the Wright-Fisher diffusion, such dynamics is captured by the transition density function, which satisfies well-known partial differential equations. For a multi-allelic model with general diploid selection, various theoretical results exist on representations of the transition d…
▽ More
Characterizing time-evolution of allele frequencies in a population is a fundamental problem in population genetics. In the Wright-Fisher diffusion, such dynamics is captured by the transition density function, which satisfies well-known partial differential equations. For a multi-allelic model with general diploid selection, various theoretical results exist on representations of the transition density, but finding an explicit formula has remained a difficult problem. In this paper, a technique recently developed for a diallelic model is extended to find an explicit transition density for an arbitrary number of alleles, under a general diploid selection model with recurrent parent-independent mutation. Specifically, the method finds the eigenvalues and eigenfunctions of the generator associated with the multi-allelic diffusion, thus yielding an accurate spectral representation of the transition density. Furthermore, this approach allows for efficient, accurate computation of various other quantities of interest, including the normalizing constant of the stationary distribution and the rate of convergence to this distribution.
△ Less
Submitted 1 November, 2012; v1 submitted 24 August, 2012;
originally announced August 2012.
-
A sequentially Markov conditional sampling distribution for structured populations with migration and recombination
Authors:
Matthias Steinrücken,
Joshua S. Paul,
Yun S. Song
Abstract:
Conditional sampling distributions (CSDs), sometimes referred to as copying models, underlie numerous practical tools in population genomic analyses. Though an important application that has received much attention is the inference of population structure, the explicit exchange of migrants at specified rates has not hitherto been incorporated into the CSD in a principled framework. Recently, in th…
▽ More
Conditional sampling distributions (CSDs), sometimes referred to as copying models, underlie numerous practical tools in population genomic analyses. Though an important application that has received much attention is the inference of population structure, the explicit exchange of migrants at specified rates has not hitherto been incorporated into the CSD in a principled framework. Recently, in the case of a single panmictic population, a sequentially Markov CSD has been developed as an accurate, efficient approximation to a principled CSD derived from the diffusion process dual to the coalescent with recombination. In this paper, the sequentially Markov CSD framework is extended to incorporate subdivided population structure, thus providing an efficiently computable CSD that admits a genealogical interpretation related to the structured coalescent with migration and recombination. As a concrete application, it is demonstrated empirically that the CSD developed here can be employed to yield accurate estimation of a wide range of migration rates.
△ Less
Submitted 1 November, 2012; v1 submitted 24 August, 2012;
originally announced August 2012.
-
Importance sampling for Lambda-coalescents in the infinitely many sites model
Authors:
Matthias Birkner,
Jochen Blath,
Matthias Steinruecken
Abstract:
We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolt…
▽ More
We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolth et. al.'s (2008) idea of deriving importance sampling schemes based on 'compressed genetrees'. The resulting schemes extend earlier work by Griffiths and Tavaré (1994), Stephens and Donnelly (2000), Birkner and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance comparison of classical and new schemes for Beta- and Kingman coalescents.
△ Less
Submitted 9 May, 2011;
originally announced May 2011.
-
A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks
Authors:
Matthias Birkner,
Jochen Blath,
Martin Moehle,
Matthias Steinruecken,
Johanna Tams
Abstract:
Let $Λ$ be a finite measure on the unit interval. A $Λ$-Fleming-Viot process is a probability measure valued Markov process which is dual to a coalescent with multiple collisions ($Λ$-coalescent) in analogy to the duality known for the classical Fleming Viot process and Kingman's coalescent, where $Λ$ is the Dirac measure in 0.
We explicitly construct a dual process of the coalescent with simu…
▽ More
Let $Λ$ be a finite measure on the unit interval. A $Λ$-Fleming-Viot process is a probability measure valued Markov process which is dual to a coalescent with multiple collisions ($Λ$-coalescent) in analogy to the duality known for the classical Fleming Viot process and Kingman's coalescent, where $Λ$ is the Dirac measure in 0.
We explicitly construct a dual process of the coalescent with simultaneous multiple collisions ($Ξ$-coalescent) with mutation, the $Ξ$-Fleming-Viot process with mutation, and provide a representation based on the empirical measure of an exchangeable particle system along the lines of Donnelly and Kurtz (1999). We establish pathwise convergence of the approximating systems to the limiting $Ξ$-Fleming-Viot process with mutation. An alternative construction of the semigroup based on the Hille-Yosida theorem is provided and various types of duality of the processes are discussed.
In the last part of the paper a population is considered which undergoes recurrent bottlenecks. In this scenario, non-trivial $Ξ$-Fleming-Viot processes naturally arise as limiting models.
△ Less
Submitted 27 October, 2008; v1 submitted 4 August, 2008;
originally announced August 2008.