Detectability of Varied Hybridization Scenarios using Genome-Scale Hybrid Detection Methods
Authors:
Marianne Bjorner,
Erin K. Molloy,
Colin N. Dewey,
Claudia Solis-Lemus
Abstract:
Hybridization events complicate the accurate reconstruction of phylogenies, as they lead to patterns of genetic heritability that are unexpected under traditional, bifurcating models of species trees. This has led to the development of methods to infer these varied hybridization events, both methods that reconstruct networks directly, and summary methods that predict individual hybridization event…
▽ More
Hybridization events complicate the accurate reconstruction of phylogenies, as they lead to patterns of genetic heritability that are unexpected under traditional, bifurcating models of species trees. This has led to the development of methods to infer these varied hybridization events, both methods that reconstruct networks directly, and summary methods that predict individual hybridization events. However, a lack of empirical comparisons between methods - especially pertaining to large networks with varied hybridization scenarios - hinders their practical use. Here, we provide a comprehensive review of popular summary methods: TICR, MSCquartets, HyDe, Patterson's D-Statistic (ABBA-BABA), D3, and Dp. TICR and MSCquartets are based on quartet concordance factors gathered from gene tree topologies and Patterson's D-Statistic, D3, and Dp use site pattern frequencies to identify hybridization events. We then use simulated data to address questions of method accuracy and ideal use scenarios by testing methods against complex networks which depict gene flow events that differ in depth (timing), quantity (single vs. multiple, overlapping hybridizations), and rate of gene flow. We find that deeper or multiple hybridization events may introduce noise and weaken the signal of hybridization, leading to higher false negative rates across methods. Despite some forms of hybridization eluding quartet-based detection methods, MSCquartets displays high precision in most scenarios. While HyDe results in high false negative rates when tested on hybridizations involving ghost lineages, HyDe is the only method to be able to separate hybrid vs parent signals. Lastly, we test the methods on ultraconserved elements from the bee subfamily Nomiinae, finding the possibility of hybridization events between clades which correspond to regions of poor support in the species tree estimated in the original study.
△ Less
Submitted 7 November, 2023; v1 submitted 1 November, 2022;
originally announced November 2022.
Parametric inference of recombination in HIV genomes
Authors:
Niko Beerenwinkel,
Colin N. Dewey,
Kevin M. Woods
Abstract:
Recombination is an important event in the evolution of HIV. It affects the global spread of the pandemic as well as evolutionary escape from host immune response and from drug therapy within single patients. Comprehensive computational methods are needed for detecting recombinant sequences in large databases, and for inferring the parental sequences.
We present a hidden Markov model to annota…
▽ More
Recombination is an important event in the evolution of HIV. It affects the global spread of the pandemic as well as evolutionary escape from host immune response and from drug therapy within single patients. Comprehensive computational methods are needed for detecting recombinant sequences in large databases, and for inferring the parental sequences.
We present a hidden Markov model to annotate a query sequence as a recombinant of a given set of aligned sequences. Parametric inference is used to determine all optimal annotations for all parameters of the model. We show that the inferred annotations recover most features of established hand-curated annotations. Thus, parametric analysis of the hidden Markov model is feasible for HIV full-length genomes, and it improves the detection and annotation of recombinant forms.
All computational results, reference alignments, and C++ source code are available at http://bio.math.berkeley.edu/recombination/.
△ Less
Submitted 8 December, 2005;
originally announced December 2005.