Search | arXiv e-print repository

arXiv:1911.11190 [pdf, other]

doi 10.1007/s42979-022-01198-7

Orienting Ordered Scaffolds: Complexity and Algorithms

Authors: Sergey Aganezov, Pavel Avdeyev, Nikita Alexeev, Yongwu Rong, Max A. Alekseyev

Abstract: Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds… ▽ More Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds. We address the problem of orientating ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem using notion of a scaffold graph (i.e., a graph, where vertices correspond to the assembled contigs or scaffolds and edges represent connections between them). We prove that this problem is NP-hard, and present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. We further develop an FPT algorithm for the general case of the OOS problem. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Journal ref: SN Computer Science 3 (2022), 308

arXiv:1602.02841 [pdf, other]

doi 10.1007/978-3-319-42634-1_45

Combinatorial Scoring of Phylogenetic Networks

Authors: Nikita Alexeev, Max A. Alekseyev

Abstract: Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species. In… ▽ More Construction of phylogenetic trees and networks for extant species from their characters represents one of the key problems in phylogenomics. While solution to this problem is not always uniquely defined and there exist multiple methods for tree/network construction, it becomes important to measure how well the constructed networks capture the given character relationship across the species. In the current study, we propose a novel method for measuring the specificity of a given phylogenetic network in terms of the total number of distributions of character states at the leaves that the network may impose. While for binary phylogenetic trees, this number has an exact formula and depends only on the number of leaves and character states but not on the tree topology, the situation is much more complicated for non-binary trees or networks. Nevertheless, we develop an algorithm for combinatorial enumeration of such distributions, which is applicable for arbitrary trees and networks under some reasonable assumptions. △ Less

Submitted 8 August, 2016; v1 submitted 8 February, 2016; originally announced February 2016.

Comments: 12 pages; 3 figures

Journal ref: Lecture Notes in Computer Science 9797 (2016), 560-572

arXiv:1510.08002 [pdf, other]

doi 10.1186/s12864-017-3733-3

Estimation of the True Evolutionary Distance under the Fragile Breakage Model

Authors: Nikita Alexeev, Max A. Alekseyev

Abstract: The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the… ▽ More The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain "fragile" genomic regions are prone to rearrangements. We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes. △ Less

Submitted 25 May, 2017; v1 submitted 27 October, 2015; originally announced October 2015.

Journal ref: BMC Genomics 18:Suppl 4 (2017), 356

arXiv:1503.05285 [pdf, ps, other]

doi 10.1089/cmb.2016.0190

Generalized Hultman Numbers and Cycle Structures of Breakpoint Graphs

Authors: Nikita Alexeev, Anna Pologova, Max A. Alekseyev

Abstract: Genome rearrangements can be modeled as $k$-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as $2$-breaks, and transpositions are modeled as $3$-breaks. While $k$-break rearrangements for $k>3$ have not been observed in evolution, they are used in cancer genomics to model chromo… ▽ More Genome rearrangements can be modeled as $k$-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as $2$-breaks, and transpositions are modeled as $3$-breaks. While $k$-break rearrangements for $k>3$ have not been observed in evolution, they are used in cancer genomics to model chromothripsis, a catastrophic event of multiple breakages happening simultaneously in a genome. It is known that the $k$-break distance between two genomes (i.e., the minimum number of $k$-breaks required to transform one genome into the other) can be computed in terms of cycle lengths in the breakpoint graph of these genomes. In the current work, we address the combinatorial problem of enumerating genomes at a given $k$-break distance from a fixed unichromosomal genome. More generally, we enumerate genome pairs, whose breakpoint graph has a given distribution of cycle lengths. We further show how our enumeration can be used for uniform sampling of random genomes at a given $k$-break distance, and describe its connection to various combinatorial objects such as Bell polynomials. △ Less

Submitted 11 February, 2017; v1 submitted 18 March, 2015; originally announced March 2015.

Journal ref: Journal of Computational Biology 24:2 (2017), 93-105

arXiv:1501.07546 [pdf, other]

doi 10.1007/978-3-319-16483-0_46

A Computational Method for the Rate Estimation of Evolutionary Transpositions

Authors: Nikita Alexeev, Rustem Aidagulov, Max A. Alekseyev

Abstract: Genome rearrangements are evolutionary events that shuffle genomic architectures. Most frequent genome rearrangements are reversals, translocations, fusions, and fissions. While there are some more complex genome rearrangements such as transpositions, they are rarely observed and believed to constitute only a small fraction of genome rearrangements happening in the course of evolution. The analysi… ▽ More Genome rearrangements are evolutionary events that shuffle genomic architectures. Most frequent genome rearrangements are reversals, translocations, fusions, and fissions. While there are some more complex genome rearrangements such as transpositions, they are rarely observed and believed to constitute only a small fraction of genome rearrangements happening in the course of evolution. The analysis of transpositions is further obfuscated by intractability of the underlying computational problems. We propose a computational method for estimating the rate of transpositions in evolutionary scenarios between genomes. We applied our method to a set of mammalian genomes and estimated the transpositions rate in mammalian evolution to be around 0.26. △ Less

Submitted 29 January, 2015; originally announced January 2015.

Comments: Proceedings of the 3rd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), 2015. (to appear)

Journal ref: Lecture Notes in Computer Science 9043 (2015), pp. 471-480

Showing 1–5 of 5 results for author: Alexeev, N