-
kdetrees: Nonparametric Estimation of Phylogenetic Tree Distributions
Authors:
Grady Weyenberg,
Peter Huggins,
Christopher Schardl,
Daniel K Howe,
Ruriko Yoshida
Abstract:
Motivation: While the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well-known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from those of the majority of genes. S…
▽ More
Motivation: While the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well-known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from those of the majority of genes. Such "outlying" gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics.
Results: We propose and implement KDETREES, a nonparametric method of estimating distributions of phylogenetic trees, with the goal of identifying trees which are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently-published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of KDETREES to a set of Apicomplexa genes identified several unreliable sequence alignments which had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloe genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy.
Availability: Our method for estimating tree distributions and identifying outlying trees is implemented as the R package KDETREES, and is available for download from CRAN.
△ Less
Submitted 22 April, 2014; v1 submitted 26 February, 2013;
originally announced February 2013.
-
Statistical Phylogenetic Tree Analysis Using Differences of Means
Authors:
Elissaveta Arnaoudova,
David Haws,
Peter Huggins,
Jerzy W. Jaromczyk,
Neil Moore,
Chris Schardl,
Ruriko Yoshida
Abstract:
We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by the input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as ho…
▽ More
We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by the input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after embedding trees in a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel trick" which speeds up distance calculations when trees are embedded in a high-dimensional feature space, e.g. splits or quartets feature space. In this pilot study, first we test our statistical method's ability to distinguish between sets of gene trees generated under coalescence models with species trees of varying dissimilarity. We follow our simulation results with applications to various data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, {\tt Phylotree}, is provided to facilitate computational experiments.
△ Less
Submitted 12 April, 2010;
originally announced April 2010.
-
A Novel Test for Host-Symbiont Codivergence Indicates Ancient Origin of Fungal Endophytes in Grasses
Authors:
Chris L. Schardl,
Kelly D. Craven,
Adam Lindstrom,
Skyler Speakman,
Arnold Stromberg,
Ruriko Yoshida
Abstract:
Significant phylogenetic codivergence between plant or animal hosts ($H$) and their symbionts or parasites ($P$) indicate the importance of their interactions on evolutionary time scales. However, valid and realistic methods to test for codivergence are not fully developed. One of the systems where possible codivergence has been of interest involves the large subfamily of temperate grasses (Pooi…
▽ More
Significant phylogenetic codivergence between plant or animal hosts ($H$) and their symbionts or parasites ($P$) indicate the importance of their interactions on evolutionary time scales. However, valid and realistic methods to test for codivergence are not fully developed. One of the systems where possible codivergence has been of interest involves the large subfamily of temperate grasses (Pooideae) and their endophytic fungi (epichloae). These widespread symbioses often help protect host plants from herbivory and stresses, and affect species diversity and food web structures. Here we introduce the MRCALink (most-recent-common-ancestor link) method and use it to investigate the possibility of grass-epichloƫ codivergence. MRCALink applied to ultrametric $H$ and $P$ trees identifies all corresponding nodes for pairwise comparisons of MRCA ages. The result is compared to the space of random $H$ and $P$ tree pairs estimated by a Monte Carlo method. Compared to tree reconciliation the method is less dependent on tree topologies (which often can be misleading), and it crucially improves on phylogeny-independent methods such as {\tt ParaFit} or the Mantel test by eliminating an extreme (but previously unrecognized) distortion of node-pair sampling. Analysis of 26 grass species-epichloƫ species symbioses did not reject random association of $H$ and $P$ MRCA ages. However, when five obvious host jumps were removed the analysis significantly rejected random association and supported grass-endophyte codivergence. Interestingly, early cladogenesis events in the Pooideae corresponded to early cladogenesis events in epichloae, suggesting concomitant origins of this grass subfamily and its remarkable group of symbionts. We also applied our method to the well-known gopher-louse data set.
△ Less
Submitted 28 August, 2008; v1 submitted 25 November, 2006;
originally announced November 2006.