-
The Longest Run Subsequence Problem: Further Complexity Results
Authors:
Riccardo Dondi,
Florian Sikora
Abstract:
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractab…
▽ More
Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al., WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.
△ Less
Submitted 22 June, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Complexity Issues of String to Graph Approximate Matching
Authors:
Riccardo Dondi,
Giancarlo Mauri,
Italo Zoppis
Abstract:
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we pre…
▽ More
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we present results on the complexity of the approximate matching problem, where edit operations are symbol substitutions and are allowed only on the graph labels or both on the graph labels and the query string. We introduce a variant of the problem that asks whether there exists a path in a graph that represents a query string with any number of edit operations and we show that is is NP-complete, even when labels have length one and in the case the alphabet is binary. Moreover, when it is parameterized by the length of the input string and graph labels have length one, we show that the problem is fixed-parameter tractable and it is unlikely to admit a polynomial kernel. The NP-completeness of this problem leads to the inapproximability (within any factor) of the approximate matching when edit operations are allowed only on the graph labels. Moreover, we show that the variants of approximate string matching to graph we consider are not fixed-parameter tractable, when the parameter is the number of edit operations, even for graphs that have distance one from a DAG. The reduction for this latter result allows us to prove the inapproximability of the variant where edit operations can be applied both on the query string and on graph labels.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Reconciling Multiple Genes Trees via Segmental Duplications and Losses
Authors:
Riccardo Dondi,
Manuel Lafond,
Celine Scornavacca
Abstract:
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary even…
▽ More
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost $δ$ and $λ$, respectively. We show that the problem is polynomial-time solvable when $δ\leq λ$ (via LCA-mapping), while if $δ> λ$ the problem is NP-hard, even when $λ= 0$ and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are $δ/λ$ and the number $d$ of segmental duplications, of time complexity $O(\lceil \fracδλ \rceil^{d} \cdot n \cdot \fracδλ)$. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or refute hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
On the Complexity of Minimum Labeling Alignment of Two Genomes
Authors:
Riccardo Dondi,
Nadia El-Mabrouk
Abstract:
In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard.
In this note we investigate the complexity of the Minimum Label Alignment problem and we show that such a problem is APX-hard.
△ Less
Submitted 8 June, 2012;
originally announced June 2012.