Search | arXiv e-print repository

Sorting Genomes by Prefix Double-Cut-and-Joins

Authors: Guillaume Fertin, Géraldine Jean, Anthony Labarre

Abstract: In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and any other segment, and recombine the severed endpoints in one of two possible ways: one of these options corresponds to a prefix reversal, which reverses the order of elements between… ▽ More In this paper, we study the problem of sorting unichromosomal linear genomes by prefix double-cut-and-joins (or DCJs) in both the signed and the unsigned settings. Prefix DCJs cut the leftmost segment of a genome and any other segment, and recombine the severed endpoints in one of two possible ways: one of these options corresponds to a prefix reversal, which reverses the order of elements between the two cuts (as well as their signs in the signed case). Depending on whether we consider both options or reversals only, our main results are: (1) new structural lower bounds based on the breakpoint graph for sorting by unsigned prefix reversals, unsigned prefix DCJs, or signed prefix DCJs; (2) a polynomial-time algorithm for sorting by signed prefix DCJs, thus answering an open question in [8]; (3) a 3/2-approximation for sorting by unsigned prefix DCJs, which is, to the best of our knowledge, the first sorting by {\em prefix} rearrangements problem that admits an approximation ratio strictly smaller than 2 (with the obvious exception of the polynomial-time solvable problems); and finally, (4) an FPT algorithm for sorting by unsigned prefix DCJs parameterised by the number of breakpoints in the genome. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:1908.03870 [pdf, ps, other]

Graph Motif Problems Parameterized by Dual

Authors: Guillaume Fertin, Christian Komusiewicz

Abstract: Let $G=(V,E)$ be a vertex-colored graph, where $C$ is the set of colors used to color $V$. The Graph Motif (or GM) problem takes as input $G$, a multiset $M$ of colors built from $C$, and asks whether there is a subset $S\subseteq V$ such that (i) $G[S]$ is connected and (ii) the multiset of colors obtained from $S$ equals $M$. The Colorful Graph Motif (or CGM) problem is the special case of GM in… ▽ More Let $G=(V,E)$ be a vertex-colored graph, where $C$ is the set of colors used to color $V$. The Graph Motif (or GM) problem takes as input $G$, a multiset $M$ of colors built from $C$, and asks whether there is a subset $S\subseteq V$ such that (i) $G[S]$ is connected and (ii) the multiset of colors obtained from $S$ equals $M$. The Colorful Graph Motif (or CGM) problem is the special case of GM in which $M$ is a set, and the List-Colored Graph Motif (or LGM) problem is the extension of GM in which each vertex $v$ of $V$ may choose its color from a list $\mathcal{L}(v)\subseteq C$ of colors. We study the three problems GM, CGM, and LGM, parameterized by the dual parameter $\ell:=|V|-|M|$. For general graphs, we show that, assuming the strong exponential time hypothesis, CGM has no $(2-ε)^\ell\cdot |V|^{\mathcal{O}(1)}$-time algorithm, which implies that a previous algorithm, running in $\mathcal{O}(2^\ell\cdot |E|)$ time is optimal [Betzler et al., IEEE/ACM TCBB 2011]. We also prove that LGM is W[1]-hard with respect to $\ell$ even if we restrict ourselves to lists of at most two colors. If we constrain the input graph to be a tree, then we show that GM can be solved in $\mathcal{O}(3^\ell\cdot |V|)$ time but admits no polynomial-size problem kernel, while CGM can be solved in $\mathcal{O}(\sqrt{2}^{\ell} + |V|)$ time and admits a polynomial-size problem kernel. △ Less

Submitted 11 August, 2019; originally announced August 2019.

Comments: A preliminary version of this work appeared in Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM '16), volume 54 of LIPIcs, pages 7:1--7:12. This version contains all missing proofs and several further improvements

arXiv:1808.03561 [pdf, ps, other]

Finding a Small Number of Colourful Components

Authors: Laurent Bulteau, Konrad K. Dabrowski, Guillaume Fertin, Matthew Johnson, Daniel Paulusma, Stephane Vialette

Abstract: A partition $(V_1,\ldots,V_k)$ of the vertex set of a graph $G$ with a (not necessarily proper) colouring $c$ is colourful if no two vertices in any $V_i$ have the same colour and every set $V_i$ induces a connected graph. The COLOURFUL PARTITION problem is to decide whether a coloured graph $(G,c)$ has a colourful partition of size at most $k$. This problem is closely related to the COLOURFUL COM… ▽ More A partition $(V_1,\ldots,V_k)$ of the vertex set of a graph $G$ with a (not necessarily proper) colouring $c$ is colourful if no two vertices in any $V_i$ have the same colour and every set $V_i$ induces a connected graph. The COLOURFUL PARTITION problem is to decide whether a coloured graph $(G,c)$ has a colourful partition of size at most $k$. This problem is closely related to the COLOURFUL COMPONENTS problem, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most $p$ edges. Nevertheless we show that COLOURFUL PARTITION and COLOURFUL COMPONENTS may have different complexities for restricted instances. We tighten known NP-hardness results for both problems and in addition we prove new hardness and tractability results for COLOURFUL PARTITION. Using these results we complete our paper with a thorough parameterized study of COLOURFUL PARTITION. △ Less

Submitted 10 August, 2018; originally announced August 2018.

arXiv:1710.07584 [pdf, ps, other]

The Maximum Colorful Arborescence problem parameterized by the structure of its color hierarchy graph

Authors: Guillaume Fertin, Julien Fradin, Christian Komusiewicz

Abstract: Let G=(V,A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r, and let H be its color hierarchy graph, defined as follows: V(H) is the color set C of G, and an arc from color c to color c' exists in H if there is an arc in G from a vertex of color c to a vertex of color c'. In this paper, we study the MAXIMUM COLORFUL ARBORESCENCE problem (or MCA), which takes a… ▽ More Let G=(V,A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r, and let H be its color hierarchy graph, defined as follows: V(H) is the color set C of G, and an arc from color c to color c' exists in H if there is an arc in G from a vertex of color c to a vertex of color c'. In this paper, we study the MAXIMUM COLORFUL ARBORESCENCE problem (or MCA), which takes as input a DAG G with the additional constraint that H is also a DAG, and aims at finding in G an arborescence rooted in r, of maximum weight, and in which no color appears more than once. The MCA problem is motivated by the inference of unknown metabolites from mass spectrometry experiments. However, whereas the problem has been studied for roughly ten years, the crucial property that H is necessarily a DAG has only been pointed out and exploited very recently. In this paper, we further investigate MCA under this new light, by providing algorithmic results for the problem, with a specific focus on fixed-parameterized tractability (FPT) issues, and relatively to different structural parameters of H. In particular, we provide an O*(3^{nhs}) time algorithm for solving MCA, where nhs is the number of vertices of indegree at least two in H, thereby improving the O*(3^{|C|}) algorithm from [Böcker et al. 2008]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of H, and further show that it is FPT relatively to Ht+lc, where lc = |V| - |C|. △ Less

Submitted 28 February, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

Comments: Submitted a 12 pages version (+ rest in Appendix for referees) to CPM 2018

arXiv:1604.08603 [pdf, ps, other]

doi 10.1007/978-3-319-42634-1_32

Decomposing Cubic Graphs into Connected Subgraphs of Size Three

Authors: Laurent Bulteau, Guillaume Fertin, Anthony Labarre, Romeo Rizzi, Irena Rusu

Abstract: Let $S=\{K_{1,3},K_3,P_4\}$ be the set of connected graphs of size 3. We study the problem of partitioning the edge set of a graph $G$ into graphs taken from any non-empty $S'\subseteq S$. The problem is known to be NP-complete for any possible choice of $S'$ in general graphs. In this paper, we assume that the input graph is cubic, and study the computational complexity of the problem of partitio… ▽ More Let $S=\{K_{1,3},K_3,P_4\}$ be the set of connected graphs of size 3. We study the problem of partitioning the edge set of a graph $G$ into graphs taken from any non-empty $S'\subseteq S$. The problem is known to be NP-complete for any possible choice of $S'$ in general graphs. In this paper, we assume that the input graph is cubic, and study the computational complexity of the problem of partitioning its edge set for any choice of $S'$. We identify all polynomial and NP-complete problems in that setting, and give graph-theoretic characterisations of $S'$-decomposable cubic graphs in some cases. △ Less

Submitted 28 April, 2016; originally announced April 2016.

Comments: to appear in the proceedings of COCOON 2016

arXiv:1307.7842 [pdf, other]

A Fixed-Parameter Algorithm for Minimum Common String Partition with Few Duplications

Authors: Laurent Bulteau, Guillaume Fertin, Christian Komusiewicz, Irena Rusu

Abstract: Motivated by the study of genome rearrangements, the NP-hard Minimum Common String Partition problems asks, given two strings, to split both strings into an identical set of blocks. We consider an extension of this problem to unbalanced strings, so that some elements may not be covered by any block. We present an efficient fixed-parameter algorithm for the parameters number k of blocks and maximum… ▽ More Motivated by the study of genome rearrangements, the NP-hard Minimum Common String Partition problems asks, given two strings, to split both strings into an identical set of blocks. We consider an extension of this problem to unbalanced strings, so that some elements may not be covered by any block. We present an efficient fixed-parameter algorithm for the parameters number k of blocks and maximum occurrence d of a letter in either string. We then evaluate this algorithm on bacteria genomes and synthetic data. △ Less

Submitted 30 July, 2013; originally announced July 2013.

Comments: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

arXiv:1111.0434 [pdf, other]

doi 10.1007/978-3-642-32589-2_24

Pancake Flipping is Hard

Authors: Laurent Bulteau, Guillaume Fertin, Irena Rusu

Abstract: Pancake Flipping is the problem of sorting a stack of pancakes of different sizes (that is, a permutation), when the only allowed operation is to insert a spatula anywhere in the stack and to flip the pancakes above it (that is, to perform a prefix reversal). In the burnt variant, one side of each pancake is marked as burnt, and it is required to finish with all pancakes having the burnt side down… ▽ More Pancake Flipping is the problem of sorting a stack of pancakes of different sizes (that is, a permutation), when the only allowed operation is to insert a spatula anywhere in the stack and to flip the pancakes above it (that is, to perform a prefix reversal). In the burnt variant, one side of each pancake is marked as burnt, and it is required to finish with all pancakes having the burnt side down. Computing the optimal scenario for any stack of pancakes and determining the worst-case stack for any stack size have been challenges over more than three decades. Beyond being an intriguing combinatorial problem in itself, it also yields applications, e.g. in parallel computing and computational biology. In this paper, we show that the Pancake Flipping problem, in its original (unburnt) variant, is NP-hard, thus answering the long-standing question of its computational complexity. △ Less

Submitted 10 November, 2011; v1 submitted 2 November, 2011; originally announced November 2011.

Comments: Corrected references

arXiv:1011.1157 [pdf, other]

doi 10.1137/110851390

Sorting by Transpositions is Difficult

Authors: Laurent Bulteau, Guillaume Fertin, Irena Rusu

Abstract: In comparative genomics, a transposition is an operation that exchanges two consecutive sequences of genes in a genome. The transposition distance, that is, the minimum number of transpositions needed to transform a genome into another, is, according to numerous studies, a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations, called th… ▽ More In comparative genomics, a transposition is an operation that exchanges two consecutive sequences of genes in a genome. The transposition distance, that is, the minimum number of transpositions needed to transform a genome into another, is, according to numerous studies, a relevant evolutionary distance. The problem of computing this distance when genomes are represented by permutations, called the Sorting by Transpositions problem, has been introduced by Bafna and Pevzner in 1995. It has naturally been the focus of a number of studies, but the computational complexity of this problem has remained undetermined for 15 years. In this paper, we answer this long-standing open question by proving that the Sorting by Transpositions problem is NP-hard. As a corollary of our result, we also prove that the following problem is NP-hard: given a permutation pi, is it possible to sort pi using db(pi)/3 permutations, where db(pi) is the number of breakpoints of pi? △ Less

Submitted 4 November, 2010; originally announced November 2010.

arXiv:0806.1103 [pdf, ps, other]

On the Approximability of Comparing Genomes with Duplicates

Authors: Sébastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thevenin, Stéphane Vialette

Abstract: A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogeny. All the existing measures are defined on genomes without duplicates. However, we know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a match… ▽ More A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogeny. All the existing measures are defined on genomes without duplicates. However, we know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measure M, computing a matching between two genomes that optimizes M is APX-hard. We also study the complexity of the following problem: is there an exemplarization (resp. an intermediate/maximum matching) that induces no breakpoint? We prove the problem to be NP-Complete in the exemplar model for a new class of instances, and we show that the problem is in P in the maximum matching model. We also focus on a fourth measure: the number of adjacencies, for which we give several approximation algorithms in the maximum matching model, in the case where genomes contain the same number of duplications of each gene. △ Less

Submitted 6 June, 2008; originally announced June 2008.

Showing 1–9 of 9 results for author: Fertin, G