-
Orienting Ordered Scaffolds: Complexity and Algorithms
Authors:
Sergey Aganezov,
Pavel Avdeyev,
Nikita Alexeev,
Yongwu Rong,
Max A. Alekseyev
Abstract:
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds…
▽ More
Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds.
We address the problem of orientating ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem using notion of a scaffold graph (i.e., a graph, where vertices correspond to the assembled contigs or scaffolds and edges represent connections between them). We prove that this problem is NP-hard, and present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. We further develop an FPT algorithm for the general case of the OOS problem.
△ Less
Submitted 25 November, 2019;
originally announced November 2019.
-
Combinatorial Scoring of Phylogenetic Networks
Authors:
Nikita Alexeev,
Max A. Alekseyev
Abstract:
Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species.
In…
▽ More
Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species.
In the current study, we propose a novel method for measuring the specificity of a given phylogenetic network in terms of the total number of distributions of character states at the leaves that the network may impose. While for binary phylogenetic trees, this number has an exact formula and depends only on the number of leaves and character states but not on the tree topology, the situation is much more complicated for non-binary trees or networks. Nevertheless, we develop an algorithm for combinatorial enumeration of such distributions, which is applicable for arbitrary trees and networks under some reasonable assumptions.
△ Less
Submitted 8 August, 2016; v1 submitted 8 February, 2016;
originally announced February 2016.
-
Estimation of the True Evolutionary Distance under the Fragile Breakage Model
Authors:
Nikita Alexeev,
Max A. Alekseyev
Abstract:
The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the…
▽ More
The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain "fragile" genomic regions are prone to rearrangements.
We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes.
△ Less
Submitted 25 May, 2017; v1 submitted 27 October, 2015;
originally announced October 2015.
-
Generalized Hultman Numbers and Cycle Structures of Breakpoint Graphs
Authors:
Nikita Alexeev,
Anna Pologova,
Max A. Alekseyev
Abstract:
Genome rearrangements can be modeled as $k$-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as $2$-breaks, and transpositions are modeled as $3$-breaks. While $k$-break rearrangements for $k>3$ have not been observed in evolution, they are used in cancer genomics to model chromo…
▽ More
Genome rearrangements can be modeled as $k$-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as $2$-breaks, and transpositions are modeled as $3$-breaks. While $k$-break rearrangements for $k>3$ have not been observed in evolution, they are used in cancer genomics to model chromothripsis, a catastrophic event of multiple breakages happening simultaneously in a genome. It is known that the $k$-break distance between two genomes (i.e., the minimum number of $k$-breaks required to transform one genome into the other) can be computed in terms of cycle lengths in the breakpoint graph of these genomes.
In the current work, we address the combinatorial problem of enumerating genomes at a given $k$-break distance from a fixed unichromosomal genome. More generally, we enumerate genome pairs, whose breakpoint graph has a given distribution of cycle lengths. We further show how our enumeration can be used for uniform sampling of random genomes at a given $k$-break distance, and describe its connection to various combinatorial objects such as Bell polynomials.
△ Less
Submitted 11 February, 2017; v1 submitted 18 March, 2015;
originally announced March 2015.
-
A Computational Method for the Rate Estimation of Evolutionary Transpositions
Authors:
Nikita Alexeev,
Rustem Aidagulov,
Max A. Alekseyev
Abstract:
Genome rearrangements are evolutionary events that shuffle genomic architectures. Most frequent genome rearrangements are reversals, translocations, fusions, and fissions. While there are some more complex genome rearrangements such as transpositions, they are rarely observed and believed to constitute only a small fraction of genome rearrangements happening in the course of evolution. The analysi…
▽ More
Genome rearrangements are evolutionary events that shuffle genomic architectures. Most frequent genome rearrangements are reversals, translocations, fusions, and fissions. While there are some more complex genome rearrangements such as transpositions, they are rarely observed and believed to constitute only a small fraction of genome rearrangements happening in the course of evolution. The analysis of transpositions is further obfuscated by intractability of the underlying computational problems.
We propose a computational method for estimating the rate of transpositions in evolutionary scenarios between genomes. We applied our method to a set of mammalian genomes and estimated the transpositions rate in mammalian evolution to be around 0.26.
△ Less
Submitted 29 January, 2015;
originally announced January 2015.