-
DiscoVista: interpretable visualizations of gene tree discordance
Authors:
Erfan Sayyari,
James B. Whitfield,
Siavash Mirarab
Abstract:
Phylogenomics has ushered in an age of discordance. Analyses often reveal abundant discordances among phylogenies of different parts of genomes, as well as incongruences between species trees obtained using different methods or data partitions. Researchers are often left trying to make sense of such incongruences. Interpretive ways of measuring and visualizing discordance are needed, both among al…
▽ More
Phylogenomics has ushered in an age of discordance. Analyses often reveal abundant discordances among phylogenies of different parts of genomes, as well as incongruences between species trees obtained using different methods or data partitions. Researchers are often left trying to make sense of such incongruences. Interpretive ways of measuring and visualizing discordance are needed, both among alternative species trees and gene trees, especially for specific focal branches of a tree. Here, we introduce DiscoVista, a publicly available tool that creates a suite of simple but interpretable visualizations. DiscoVista helps quantify the amount of discordance and some of its potential causes.
△ Less
Submitted 30 January, 2018; v1 submitted 26 September, 2017;
originally announced September 2017.
-
Testing for polytomies in phylogenetic species trees using quartet frequencies
Authors:
Erfan Sayyari,
Siavash Mirarab
Abstract:
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reco…
▽ More
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the ASTRAL package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.
△ Less
Submitted 6 February, 2018; v1 submitted 28 August, 2017;
originally announced August 2017.
-
Fast coalescent-based computation of local branch support from quartet frequencies
Authors:
Erfan Sayyari,
Siavash Mirarab
Abstract:
Species tree reconstruction is complicated by effects of Incomplete Lineage Sorting (ILS), commonly modeled by the multi-species coalescent model. While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this paper, we propose a fast algorithm t…
▽ More
Species tree reconstruction is complicated by effects of Incomplete Lineage Sorting (ILS), commonly modeled by the multi-species coalescent model. While there has been substantial progress in developing methods that estimate a species tree given a collection of gene trees, less attention has been paid to fast and accurate methods of quantifying support. In this paper, we propose a fast algorithm to compute quartet-based support for each branch of a given species tree with regard to a given set of gene trees. We then show how the quartet support can be used in the context of the multi-species coalescent model to compute i) the local posterior probability that the branch is in the species tree and ii) the length of the branch in coalescent units. We evaluate the precision and recall of the local posterior probability on a wide set of simulated and biological data, and show that it has very high precision and improved recall compared to multi-locus bootstrapping. The estimated branch lengths are highly accurate when gene trees have little error, but are underestimated when gene tree estimation error increases. Computation of both branch length and local posterior probability is implemented as a new feature in ASTRAL.
△ Less
Submitted 8 May, 2016; v1 submitted 24 January, 2016;
originally announced January 2016.